[openib-general] uDAPL open HCA problem

Sayantan Sur surs at cse.ohio-state.edu
Fri Oct 21 09:17:28 PDT 2005


Hello,

I have udapl over Gen2 setup on our cluster and am able to run udapl
programs. However, sometimes I get this error (after a few runs of the
same program):

 open_hca: ERR ib_at_ips_by_gid for mthca0
dapls_ib_open_hca failed 40000

The machine is a AMD Opteron (Tyan S2895), with Mellanox MemFree cards
(fw ver 5.1.0).

lsmod on my machine shows this:

[surs at ro0:~] lsmod | grep ^ib
ib_ipoib               48008  0 
ib_uat                 14840  0 
ib_at                  25696  1 ib_uat
ib_sa                  17804  2 ib_ipoib,ib_at
ib_ucm                 22280  0 
ib_cm                  37744  1 ib_ucm
ib_uverbs              35992  0 
ib_umad                18208  0 
ib_mthca              122656  0 
ib_mad                 44072  4 ib_sa,ib_cm,ib_umad,ib_mthca
ib_core                56192  8
ib_ipoib,ib_sa,ib_ucm,ib_cm,ib_uverbs,ib_umad,ib_mthca,ib_mad

My infiniband devices are (created by hand):

[surs at ro0:~] ls -l /dev/infiniband/
total 0
crw-rw-rw-  1 root root 231, 191 2005-10-20 21:13 uat
crw-rw-rw-  1 root root 231, 224 2005-10-20 21:12 ucm0
crwxrwxrwx  1 root root 231, 192 2005-09-21 04:37 umad0
crwxrwxrwx  1 root root 231, 192 2005-09-16 19:29 uverbs0
crwxrwxrwx  1 root root 231, 192 2005-09-16 19:29 uverbs1


I'd really appreciate if someone could help me understand what might be
going wrong.

Thanks,
Sayantan.

-- 
http://www.cse.ohio-state.edu/~surs



More information about the general mailing list