[openib-general] uDAPL open HCA problem

Arlin Davis ardavis at ichips.intel.com
Fri Oct 21 11:59:46 PDT 2005


Sayantan Sur wrote:

>Hello,
>
>I have udapl over Gen2 setup on our cluster and am able to run udapl
>programs. However, sometimes I get this error (after a few runs of the
>same program):
>
> open_hca: ERR ib_at_ips_by_gid for mthca0
>dapls_ib_open_hca failed 40000
>  
>

uDAPL uses uAT to get the IP address using the GID (ATS records via SA) 
of the local device/port. The SA query for this record is failing for 
some reason. Did your SM bounce during this time? Did you bounce or 
reconfigure the IPoIB network device?

You can set "env DAPL_DBG_TYPE=0xffff"  for more information.

-arlin

>The machine is a AMD Opteron (Tyan S2895), with Mellanox MemFree cards
>(fw ver 5.1.0).
>
>lsmod on my machine shows this:
>
>[surs at ro0:~] lsmod | grep ^ib
>ib_ipoib               48008  0 
>ib_uat                 14840  0 
>ib_at                  25696  1 ib_uat
>ib_sa                  17804  2 ib_ipoib,ib_at
>ib_ucm                 22280  0 
>ib_cm                  37744  1 ib_ucm
>ib_uverbs              35992  0 
>ib_umad                18208  0 
>ib_mthca              122656  0 
>ib_mad                 44072  4 ib_sa,ib_cm,ib_umad,ib_mthca
>ib_core                56192  8
>ib_ipoib,ib_sa,ib_ucm,ib_cm,ib_uverbs,ib_umad,ib_mthca,ib_mad
>
>My infiniband devices are (created by hand):
>
>[surs at ro0:~] ls -l /dev/infiniband/
>total 0
>crw-rw-rw-  1 root root 231, 191 2005-10-20 21:13 uat
>crw-rw-rw-  1 root root 231, 224 2005-10-20 21:12 ucm0
>crwxrwxrwx  1 root root 231, 192 2005-09-21 04:37 umad0
>crwxrwxrwx  1 root root 231, 192 2005-09-16 19:29 uverbs0
>crwxrwxrwx  1 root root 231, 192 2005-09-16 19:29 uverbs1
>
>
>I'd really appreciate if someone could help me understand what might be
>going wrong.
>
>Thanks,
>Sayantan.
>
>  
>




More information about the general mailing list