[openib-general] [patch] libsdp typo in config_parser
Eitan Zahavi
eitan at mellanox.co.il
Sun Aug 20 00:39:11 PDT 2006
Hi Bernhard,
The only thing I can think of is the chance you did not distribute the
libsdp config fiel to all nodes.
Please try to change the "log" directives to something like
log min-level 1 destination file libsdp.log
run MPI and send the log file /tmp/libsdp.log
Eitan
> -----Original Message-----
> From: Bernhard Fischer [mailto:rep.nop at aon.at]
> Sent: Friday, August 18, 2006 10:22 PM
> To: Eitan Zahavi
> Cc: openib-general at openib.org
> Subject: Re: [openib-general] [patch] libsdp typo in config_parser
>
> On Fri, Aug 18, 2006 at 10:05:35PM +0300, Eitan Zahavi wrote:
> >Hi Bernhard
> >
> >SDP traffic will not show on the IPoIB counters. It does no go
through
> >IPoIB.
>
> That's what i thought, thanks for confirming.
> >You can use
> >lsmod | grep ib_sdp
> >to see how many connections are made over SDP.
>
> Running lam via 2 nodes, on 2 CPUs each, i see:
> # lsmod | grep ib_sdp
> ib_sdp 28184 4
> rdma_cm 27912 1 ib_sdp
> ib_core 53632 12
>
ib_ucm,ib_uverbs,ib_sdp,rdma_cm,ib_cm,ib_local_sa,ib_umad,ib_ipoib,ib_mu
> lticast,ib_sa,ib_mthca,ib_mad
>
> I did start lamboot with libsdp.so preloaded:
> $ LD_PRELOAD=/usr/local/lib64/libsdp.so lamboot l $ lamnodes C -c -n
> node13ib.infiniband node13ib.infiniband node15ib.infiniband
> node15ib.infiniband $ LD_PRELOAD=/usr/local/lib64/libsdp.so mpirun -np
4
> /there/vasp/20060503/vasp.4.6/vasp.mpi
>
> Still, ifconfig ib0 (which hosts node??ib.infiniband on 10.100.0.0/24)
shows that
> the communication is being sent over ipoib as ifconfigs counters
constantly go
> up when communicating (only one user is active on the system).
> $ /sbin/ifconfig ib0
> ib0 Link encap:UNSPEC HWaddr
00-00-04-04-FE-80-00-00-00-00-00-00-00-
> 00-00-00
> inet addr:10.100.0.13 Bcast:10.100.0.255
Mask:255.255.255.0
> UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1
> RX packets:182037964 errors:0 dropped:0 overruns:0 frame:0
> TX packets:183607689 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:128
> RX bytes:189334244937 (180563.2 Mb) TX bytes:194777918565
> (185754.6 Mb)
>
> My libsdp.conf looks like this:
> $ cat /usr/local/etc/libsdp.conf
> #log min-level 1 destination file libsdp.log
> use both connect * 10.100.0.0/24:*
> use both server * 10.100.0.0/24:*
>
> So i fear i'm missing something crucial.
> Ideas?
>
> >Exact number of packets and data can flowing through the IB port can
be
> >obtained by :
> >/sys/class/infiniband/mthca0/ports/1/counters/port_rcv_packets
> >/sys/class/infiniband/mthca0/ports/1/counters/port_xmit_packets
>
> $ for i in /sys/class/infiniband/mthca0/ports/1/counters/*packets;do
echo -n
> $i:' ' ; cat $i;done
> /sys/class/infiniband/mthca0/ports/1/counters/port_rcv_packets:
185010549
> /sys/class/infiniband/mthca0/ports/1/counters/port_xmit_packets:
186584856
>
> PS: The different pingpong test (which have outdated names in the
openib
> wiki, btw) do work just fine if run from the very same user, so i
think that the
> basic verbs communication would work proper.
More information about the general
mailing list