[openib-general] uDAPL open HCA problem
Sayantan Sur
surs at cse.ohio-state.edu
Fri Oct 21 09:17:28 PDT 2005
Hello,
I have udapl over Gen2 setup on our cluster and am able to run udapl
programs. However, sometimes I get this error (after a few runs of the
same program):
open_hca: ERR ib_at_ips_by_gid for mthca0
dapls_ib_open_hca failed 40000
The machine is a AMD Opteron (Tyan S2895), with Mellanox MemFree cards
(fw ver 5.1.0).
lsmod on my machine shows this:
[surs at ro0:~] lsmod | grep ^ib
ib_ipoib 48008 0
ib_uat 14840 0
ib_at 25696 1 ib_uat
ib_sa 17804 2 ib_ipoib,ib_at
ib_ucm 22280 0
ib_cm 37744 1 ib_ucm
ib_uverbs 35992 0
ib_umad 18208 0
ib_mthca 122656 0
ib_mad 44072 4 ib_sa,ib_cm,ib_umad,ib_mthca
ib_core 56192 8
ib_ipoib,ib_sa,ib_ucm,ib_cm,ib_uverbs,ib_umad,ib_mthca,ib_mad
My infiniband devices are (created by hand):
[surs at ro0:~] ls -l /dev/infiniband/
total 0
crw-rw-rw- 1 root root 231, 191 2005-10-20 21:13 uat
crw-rw-rw- 1 root root 231, 224 2005-10-20 21:12 ucm0
crwxrwxrwx 1 root root 231, 192 2005-09-21 04:37 umad0
crwxrwxrwx 1 root root 231, 192 2005-09-16 19:29 uverbs0
crwxrwxrwx 1 root root 231, 192 2005-09-16 19:29 uverbs1
I'd really appreciate if someone could help me understand what might be
going wrong.
Thanks,
Sayantan.
--
http://www.cse.ohio-state.edu/~surs
More information about the general
mailing list