[openib-general] problem with libibverbs and ib_rdma_bw test
Linev Sergei
S.Linev at gsi.de
Mon Nov 6 09:17:26 PST 2006
Hi,
>
> Did you follow these directions from the libibverbs README?
>
> --
> https://openfabrics.org/svn/gen2/trunk/src/userspace/libibverbs/README
>
>
> To use IB verbs from userspace, a process must also have permission to
> tell the kernel to lock sufficient memory for all of your registered
> memory regions as well as the memory used internally by IB resources
> such as queue pairs (QPs) and completion queues (CQs). To check your
> resource limits, use the command
>
> ulimit -l
>
> (or "limit memorylocked" for csh-like shells).
>
> If you see a small number such as 32 (the units are KB) then you will
> need to increase this limit. This is usually done for ordinary users
> via the file /etc/security/limits.conf. More configuration may be
> necessary if you are logging in via OpenSSH and your sshd is
> configured to use privilege separation.
>
>
> On Mon, 6 Nov 2006, Linev Sergei wrote:
>
> > Hi
> >
> > I was trying to install OFED 1.1 on SuSE 9.3 Linux
> (2.6.11.4-20a-smp).
> > We are using Opterons with Mellanox MHES18-XT PCIe host adapters.
> > Previousely we were using IB Gold 1.8.0 and mostly working
> with uDAPL.
> >
> > Now I trying uDAPL with OFED and find out, that it is not
> working for me.
> > Actually, I see the same problem as it was reported here:
> >
> > http://openib.org/pipermail/openib-general/2006-October/028077.html
> >
> > I was trying to find out a place, where it reports a
> problem and was able to trace down to
> > dapls_ib_mr_register() call, where ibv_reg_mr() returns
> zero handle (openib_cma/dapl_ib_mem.c, line 197)
> >
> > According to recomendation in following mail,
> > http://openib.org/pipermail/openib-general/2006-October/028107.html
> >
> > I was trying to trace if ibverbs interface is working.
> > And I find out that basic latency/bandwidth tests are
> working, but when I try to run
> > and rdma-based tests, I immidiately see a problem. For
> instance, when I call on node01
> > ib_rdma_bw, and on node02 "ib_rdma_bw node01", I see on
> both nodes same error message:
> >
> > node01> ib_rdma_bw
> > 29808: | port=18515 | ib_port=1 | size=65536 | tx_depth=100
> | iters=1000 | duplex=0 | cma=0
> > 29808:pp_init_ctx: Couldn't allocate MR
> >
> > Probably, it is well known problem and I can solve it just
> by upgrading to the newest Linux?
> >
> > Any help is appreciated.
> >
> > Sergey Linev
> >
> >
> > ##########################################
> > Experiment Data Processing (EE)
> > Gesellschaft für Schwerionenforschung (GSI)
> > Planckstr. 1
> > D-64291 Darmstadt, Germany
> > ##########################################
> >
> >
It was a point!
When I cahnge limits in limits.conf file, I can run most of my uDAPL code except
disconnection of nodes (which not principal for me). I get error message:
dapl/common/dapl_ep_free.c:114: dapl_ep_free: Assertion `ep_ptr->param.ep_state == DAT_EP_STATE_DISCONNECTED || ep_ptr->param.ep_state == DAT_EP_STATE_UNCONNECTED' failed.
Thanks for the help. Only remark - how much memory I should specify to be on the safe side?
For the moment I setup 4 MB, while 256 KB was not enough. But this is just for 4-nodes system with all-to-all
connection. Do you have any suggestion, how I can calculate required memory space for 16- or 64-nodes cluster?
Thanks again for the help!
Sergey
More information about the general
mailing list