[openib-general] uDAPL open HCA problem
Arlin Davis
ardavis at ichips.intel.com
Fri Oct 21 11:59:46 PDT 2005
Sayantan Sur wrote:
>Hello,
>
>I have udapl over Gen2 setup on our cluster and am able to run udapl
>programs. However, sometimes I get this error (after a few runs of the
>same program):
>
> open_hca: ERR ib_at_ips_by_gid for mthca0
>dapls_ib_open_hca failed 40000
>
>
uDAPL uses uAT to get the IP address using the GID (ATS records via SA)
of the local device/port. The SA query for this record is failing for
some reason. Did your SM bounce during this time? Did you bounce or
reconfigure the IPoIB network device?
You can set "env DAPL_DBG_TYPE=0xffff" for more information.
-arlin
>The machine is a AMD Opteron (Tyan S2895), with Mellanox MemFree cards
>(fw ver 5.1.0).
>
>lsmod on my machine shows this:
>
>[surs at ro0:~] lsmod | grep ^ib
>ib_ipoib 48008 0
>ib_uat 14840 0
>ib_at 25696 1 ib_uat
>ib_sa 17804 2 ib_ipoib,ib_at
>ib_ucm 22280 0
>ib_cm 37744 1 ib_ucm
>ib_uverbs 35992 0
>ib_umad 18208 0
>ib_mthca 122656 0
>ib_mad 44072 4 ib_sa,ib_cm,ib_umad,ib_mthca
>ib_core 56192 8
>ib_ipoib,ib_sa,ib_ucm,ib_cm,ib_uverbs,ib_umad,ib_mthca,ib_mad
>
>My infiniband devices are (created by hand):
>
>[surs at ro0:~] ls -l /dev/infiniband/
>total 0
>crw-rw-rw- 1 root root 231, 191 2005-10-20 21:13 uat
>crw-rw-rw- 1 root root 231, 224 2005-10-20 21:12 ucm0
>crwxrwxrwx 1 root root 231, 192 2005-09-21 04:37 umad0
>crwxrwxrwx 1 root root 231, 192 2005-09-16 19:29 uverbs0
>crwxrwxrwx 1 root root 231, 192 2005-09-16 19:29 uverbs1
>
>
>I'd really appreciate if someone could help me understand what might be
>going wrong.
>
>Thanks,
>Sayantan.
>
>
>
More information about the general
mailing list