[openib-general] problem with libibverbs and ib_rdma_bw test

Linev Sergei S.Linev at gsi.de
Mon Nov 6 09:17:26 PST 2006


Hi, 


> 
> Did you follow these directions from the libibverbs README?
> 
> --
> https://openfabrics.org/svn/gen2/trunk/src/userspace/libibverbs/README
> 
> 
> To use IB verbs from userspace, a process must also have permission to
> tell the kernel to lock sufficient memory for all of your registered
> memory regions as well as the memory used internally by IB resources
> such as queue pairs (QPs) and completion queues (CQs).  To check your
> resource limits, use the command
> 
> 	ulimit -l
> 
> (or "limit memorylocked" for csh-like shells).
> 
> If you see a small number such as 32 (the units are KB) then you will
> need to increase this limit.  This is usually done for ordinary users
> via the file /etc/security/limits.conf.  More configuration may be
> necessary if you are logging in via OpenSSH and your sshd is
> configured to use privilege separation.
> 
> 
> On Mon, 6 Nov 2006, Linev Sergei wrote:
> 
> > Hi
> > 
> > I was trying to install OFED 1.1 on SuSE 9.3 Linux 
> (2.6.11.4-20a-smp).
> > We are using Opterons with Mellanox MHES18-XT PCIe host adapters.
> > Previousely we were using IB Gold 1.8.0 and mostly working 
> with uDAPL.
> > 
> > Now I trying uDAPL with OFED and find out, that it is not 
> working for me.
> > Actually, I see the same problem as it was reported here:
> > 
> > http://openib.org/pipermail/openib-general/2006-October/028077.html
> > 
> > I was trying to find out a place, where it reports a 
> problem and was able to trace down to 
> > dapls_ib_mr_register() call, where ibv_reg_mr() returns 
> zero handle (openib_cma/dapl_ib_mem.c, line 197)
> > 
> > According to recomendation in following mail,
> > http://openib.org/pipermail/openib-general/2006-October/028107.html
> > 
> > I was trying to trace if ibverbs interface is working.
> > And I find out that basic latency/bandwidth tests are 
> working, but when I try to run
> > and rdma-based tests, I immidiately see a problem. For 
> instance, when I call on node01
> > ib_rdma_bw, and on node02  "ib_rdma_bw node01", I see on 
> both nodes same error message:
> > 
> > node01> ib_rdma_bw
> > 29808: | port=18515 | ib_port=1 | size=65536 | tx_depth=100 
> | iters=1000 | duplex=0 | cma=0 
> > 29808:pp_init_ctx: Couldn't allocate MR
> > 
> > Probably, it is well known problem and I can solve it just 
> by upgrading to the newest Linux?
> > 
> > Any help is appreciated.
> > 
> > Sergey Linev
> > 
> > 
> > ##########################################
> > Experiment Data Processing (EE)
> > Gesellschaft für Schwerionenforschung (GSI)
> > Planckstr. 1 
> > D-64291 Darmstadt, Germany
> > ##########################################
> > 
> > 

It was a point!

When I cahnge limits in limits.conf file, I can run most of my uDAPL code except 
disconnection of nodes (which not principal for me). I get error message:

dapl/common/dapl_ep_free.c:114: dapl_ep_free: Assertion `ep_ptr->param.ep_state == DAT_EP_STATE_DISCONNECTED || ep_ptr->param.ep_state == DAT_EP_STATE_UNCONNECTED' failed.

Thanks for the help. Only remark - how much memory I should specify to be on the safe side?
For the moment I setup 4 MB, while 256 KB was not enough. But this is just for 4-nodes system with all-to-all
connection. Do you have any suggestion, how I can calculate required memory space for 16- or 64-nodes cluster?

Thanks again for the help!

Sergey




More information about the general mailing list