[Users] Inconsistency in delivering MCAST messages in OFED 1.5 and 2.0

Yevheniy Demchenko zheka at uvt.cz
Thu Sep 12 15:20:49 PDT 2013


This is it! Works as expected after setting block_loopback=0
Thank you very much!
Regards,

Ing. Yevheniy Demchenko
Senior Linux Administrator
UVT s.r.o.

On 09/11/2013 03:13 PM, Hal Rosenstock wrote:
> Hi Yevheniy,
>
> In MOFED mlx4_core, there is a module parameter that controls the 
> loopback and by default it’s blocked:
>
> parm: block_loopback:Block multicast loopback packets if > 0 (default: 
> 1) (int)
>
> Would you retry with mlx4_core block_loopback module param set to 0 
> and see if that helps ?
> Thanks.
> -- Hal
>
>
> On Mon, Sep 9, 2013 at 11:36 PM, Yevheniy Demchenko <zheka at uvt.cz 
> <mailto:zheka at uvt.cz>> wrote:
>
>     Hi!
>     Recently we've run into bunch of problems with cluster software
>     (RHCS) after installing latest OFED 2.0 software from Mellanox
>     (MLNX_OFED_LINUX-2.0-3.0.0-rhel6.4-x86_64.tgz
>     <http://www.mellanox.com/page/mlnx_ofed_eula?mtag=linux_sw_drivers&mrequest=downloads&mtype=ofed&mver=MLNX_OFED-2.0-3.0.0&mname=MLNX_OFED_LINUX-2.0-3.0.0-rhel6.4-x86_64.tgz>).
>     It seems, that in contrast to 1.5 ofed2.0 does not deliver
>     multicast messages to the senders receive queue, thus preventing
>     some software (namely corosync) from proper functioning. Test case
>     application is attached to this message, it is cutted from the
>     corosync stack and modified a bit.
>     To compile: gcc -o test  coropoll.c  totemiba.c totemip.c  util.c
>     -libverbs -lrdmacm -lrt
>     run: ./test -i <ip_address>, where ip_address is ib_ipoib address
>     of IB HCA.
>
>     On RHEL 6.4. distributed IB sw:
>     root at ar03 ofed2test]# ./test -i 172.32.32.13
>     addr: 172.32.32.13
>     family 2
>     initialize:
>     pollrun
>     mcast_bind
>     mcast_rdma_event_fn ADDR_RESOLVED
>     mcast_rdma_event_fn MULTICAST_JOIN
>     iface_change_fn
>     calling in send_fn
>     in totemiba_mcast_flush_send: called ibv_post_send with res=0,
>     msg_len=9;, qp_num=117, qkey=1234567
>     mcast_cq_send_event_fn, res of ibv_poll_cq=1
>     in mcast_cq_recv_event_fn res=1
>     in mcast_cq_recv_event_fn calling iba_deliver_fn, wc[0].byte_len=49
>     IN iba_deliver_fn calling main_deliver_fn with bytes=49
>     deliver_fn
>     Received message asdfasdf
>
>
>     On ofed 2.0:
>     addr: 172.32.32.12
>     family 2
>     initialize:
>     pollrun
>     mcast_bind
>     mcast_rdma_event_fn ADDR_RESOLVED
>     mcast_rdma_event_fn MULTICAST_JOIN
>     iface_change_fn
>     calling in send_fn
>     in totemiba_mcast_flush_send: called ibv_post_send with res=0,
>     msg_len=9;, qp_num=93, qkey=1234567
>     mcast_cq_send_event_fn, res of ibv_poll_cq=1
>
>     uname -a
>     Linux ar02 2.6.32-358.18.1.el6.x86_64 #1 SMP Tue Aug 27 14:23:09
>     CDT 2013 x86_64 x86_64 x86_64 GNU/Linux
>     HCA: mellanox ConnectX2, fw 2.9.1000
>
>     It seems that recv_comp_channel->fd is never touched and message
>     is not delivered to _sender_ (it is delivered to other hcas in
>     multicast group though).
>     I could not find any documentation regarding this case (delivering
>     mcast message to senders receive queue), so i am not sure which
>     behaviour is correct.
>
>     I'd very appreciate if someone could run this test on his machine
>     and confirm/disaffirm the problem. Also, it would be nice to know
>     if multicast message has to be queued to _sender's_ receive queue
>     after sending.
>
>     Regards,
>
>     -- 
>     Ing. Yevheniy Demchenko
>     Senior Linux Administrator
>     UVT s.r.o.
>
>
>     _______________________________________________
>     Users mailing list
>     Users at lists.openfabrics.org <mailto:Users at lists.openfabrics.org>
>     http://lists.openfabrics.org/cgi-bin/mailman/listinfo/users
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/users/attachments/20130913/94d1115b/attachment.html>


More information about the Users mailing list