[openib-general] uDAPL problem

James Lentini jlentini at netapp.com
Tue Sep 27 06:51:02 PDT 2005



On Mon, 26 Sep 2005, Hal Rosenstock wrote:

> On Mon, 2005-09-26 at 18:05, Todd Bowman wrote:
> > I am having a problem with uDAPL accessing
> > /dev/infiniband/{uat,ucm0}.  I am running 3549,  2.6.12 kernel with
> > backport.  Here is a snippet of the uDAPL debug messages running
> > dtest.  The dat.conf file seems to be correct, the correclty named
> > providers are being loaded.
> > 
> > 26248 Running as server
> > DAT Registry: dat_ia_openv (OpenIB-ib0,1:2,0) called
> > DAT Registry: IA OpenIB-ib0, trying to load library
> > /usr/local/lib/libdapl.so
> > libuat: Error <-1:6> couldn't open IB at device </dev/infiniband/uat>
> > libibcm: error <-1:6> opening device </dev/infiniband/ucm0>

This means that the /dev entried are not setup correctly.

> > DAPL: NOT Setting Loopback
> >  dapl_ib_init:
> > DAT Registry: dat_registry_add_provider (OpenIB-ib0,1:2,0)
> > dapl_ia_open (OpenIB-ib0, 8, 0x10019d40, 0x10019cc0)
> >  open_hca: mthca0 - 0x1001fdb0
> >  open_hca: Found dev mthca0 f422000002c90200
> >  open_hca: GID subnet 00000000000080fe id f522000002c90200
> 
> These look like they need to be endianized to me.

This looks like a bug in the way we print these values out, but I 
don't think it is the real problem. What architecture are you using?

> >  ips_by_gid: ERR ips_by_gid -1 Bad file descriptor
> >  open_hca: ERR ib_at_ips_by_gid for mthca0
> > dapls_ib_open_hca failed 40000
> > dapl_ia_open () returns 0x40000
> > 26248: Error Adaptor open: DAT_INTERNAL_ERROR
> > DAT Registry: Stopped (dat_fini)
> > DAPL: Stopped (dapl_fini)
> >  dapl_ib_release:
> > 
> > 
> > I am not running udev but manually create uat and ucm.  Here is the
> > list of /dev/infiniband:
> > 
> > ls -l /dev/infiniband/
> > total 0
> > crw-rw-rw-  1 root root 231,  64 Sep 22 15:18 issm0
> > crw-rw-rw-  1 root root 231,  65 Sep 22 15:18 issm1
> > crw-rw-rw-  1 root root 231, 254 Sep 22 22:47 uat
> 
> uat is at 231/191.
> 
> > crw-rw-rw-  1 root root 231, 255 Sep 20 22:31 ucm
> 
> I don't think you need this.
> 
> > crw-rw-rw-  1 root root 231, 255 Sep 26 20:01 ucm0
> 
> ucm devices start at 231/224.

If these changes do not fix you problem, please let us know.

> -- Hal
> 
> > crw-rw-rw-  1 root root 231,   0 Sep 22 15:18 umad0
> > crw-rw-rw-  1 root root 231,   1 Sep 22 15:18 umad1
> > crw-rw-rw-  1 root root 231, 192 Sep 20 22:30 uverbs0
> > crw-rw-rw-  1 root root 231, 193 Sep 20 22:30 uverbs1
> > 
> > 
> > And the loaded modules:
> > 
> > kdapl_ib               82000  0
> > kdapl                  14888  1 kdapl_ib
> > ib_uverbs              52064  0
> > ib_ipoib               65480  0
> > ib_ucm                 32624  0
> > ib_cm                  51944  2 kdapl_ib,ib_ucm
> > ib_uat                 22168  0
> > ib_at                  34840  2 kdapl_ib,ib_uat
> > ib_sa                  25328  2 ib_ipoib,ib_at
> > ib_mthca              160376  0
> > ib_mad                 61108  3 ib_cm,ib_sa,ib_mthca
> > ib_core                73888  8
> > kdapl_ib,ib_uverbs,ib_ipoib,ib_ucm,ib_cm,ib_sa,ib_mthca,ib_mad
> > 
> > 
> > I am sure that I am missing something simple.  Can someone point me in
> > the right direction.
> > 
> > Thanks,
> > Todd



More information about the general mailing list