[Users] Inconsistency in delivering MCAST messages in OFED 1.5 and 2.0
Yevheniy Demchenko
zheka at uvt.cz
Thu Sep 12 15:20:49 PDT 2013
This is it! Works as expected after setting block_loopback=0
Thank you very much!
Regards,
Ing. Yevheniy Demchenko
Senior Linux Administrator
UVT s.r.o.
On 09/11/2013 03:13 PM, Hal Rosenstock wrote:
> Hi Yevheniy,
>
> In MOFED mlx4_core, there is a module parameter that controls the
> loopback and by default it’s blocked:
>
> parm: block_loopback:Block multicast loopback packets if > 0 (default:
> 1) (int)
>
> Would you retry with mlx4_core block_loopback module param set to 0
> and see if that helps ?
> Thanks.
> -- Hal
>
>
> On Mon, Sep 9, 2013 at 11:36 PM, Yevheniy Demchenko <zheka at uvt.cz
> <mailto:zheka at uvt.cz>> wrote:
>
> Hi!
> Recently we've run into bunch of problems with cluster software
> (RHCS) after installing latest OFED 2.0 software from Mellanox
> (MLNX_OFED_LINUX-2.0-3.0.0-rhel6.4-x86_64.tgz
> <http://www.mellanox.com/page/mlnx_ofed_eula?mtag=linux_sw_drivers&mrequest=downloads&mtype=ofed&mver=MLNX_OFED-2.0-3.0.0&mname=MLNX_OFED_LINUX-2.0-3.0.0-rhel6.4-x86_64.tgz>).
> It seems, that in contrast to 1.5 ofed2.0 does not deliver
> multicast messages to the senders receive queue, thus preventing
> some software (namely corosync) from proper functioning. Test case
> application is attached to this message, it is cutted from the
> corosync stack and modified a bit.
> To compile: gcc -o test coropoll.c totemiba.c totemip.c util.c
> -libverbs -lrdmacm -lrt
> run: ./test -i <ip_address>, where ip_address is ib_ipoib address
> of IB HCA.
>
> On RHEL 6.4. distributed IB sw:
> root at ar03 ofed2test]# ./test -i 172.32.32.13
> addr: 172.32.32.13
> family 2
> initialize:
> pollrun
> mcast_bind
> mcast_rdma_event_fn ADDR_RESOLVED
> mcast_rdma_event_fn MULTICAST_JOIN
> iface_change_fn
> calling in send_fn
> in totemiba_mcast_flush_send: called ibv_post_send with res=0,
> msg_len=9;, qp_num=117, qkey=1234567
> mcast_cq_send_event_fn, res of ibv_poll_cq=1
> in mcast_cq_recv_event_fn res=1
> in mcast_cq_recv_event_fn calling iba_deliver_fn, wc[0].byte_len=49
> IN iba_deliver_fn calling main_deliver_fn with bytes=49
> deliver_fn
> Received message asdfasdf
>
>
> On ofed 2.0:
> addr: 172.32.32.12
> family 2
> initialize:
> pollrun
> mcast_bind
> mcast_rdma_event_fn ADDR_RESOLVED
> mcast_rdma_event_fn MULTICAST_JOIN
> iface_change_fn
> calling in send_fn
> in totemiba_mcast_flush_send: called ibv_post_send with res=0,
> msg_len=9;, qp_num=93, qkey=1234567
> mcast_cq_send_event_fn, res of ibv_poll_cq=1
>
> uname -a
> Linux ar02 2.6.32-358.18.1.el6.x86_64 #1 SMP Tue Aug 27 14:23:09
> CDT 2013 x86_64 x86_64 x86_64 GNU/Linux
> HCA: mellanox ConnectX2, fw 2.9.1000
>
> It seems that recv_comp_channel->fd is never touched and message
> is not delivered to _sender_ (it is delivered to other hcas in
> multicast group though).
> I could not find any documentation regarding this case (delivering
> mcast message to senders receive queue), so i am not sure which
> behaviour is correct.
>
> I'd very appreciate if someone could run this test on his machine
> and confirm/disaffirm the problem. Also, it would be nice to know
> if multicast message has to be queued to _sender's_ receive queue
> after sending.
>
> Regards,
>
> --
> Ing. Yevheniy Demchenko
> Senior Linux Administrator
> UVT s.r.o.
>
>
> _______________________________________________
> Users mailing list
> Users at lists.openfabrics.org <mailto:Users at lists.openfabrics.org>
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/users/attachments/20130913/94d1115b/attachment.html>
More information about the Users
mailing list