[Users] Inconsistency in delivering MCAST messages in OFED 1.5 and 2.0

Hal Rosenstock hal.rosenstock at gmail.com
Wed Sep 11 06:13:09 PDT 2013


Hi Yevheniy,


In MOFED mlx4_core, there is a module parameter that controls the loopback
and by default it’s blocked:



parm:           block_loopback:Block multicast loopback packets if > 0
(default: 1) (int)

Would you retry with mlx4_core block_loopback module param set to 0 and see
if that helps ?

Thanks.

-- Hal


On Mon, Sep 9, 2013 at 11:36 PM, Yevheniy Demchenko <zheka at uvt.cz> wrote:

>  Hi!
> Recently we've run into bunch of problems with cluster software (RHCS)
> after installing latest OFED 2.0 software from Mellanox (
> MLNX_OFED_LINUX-2.0-3.0.0-rhel6.4-x86_64.tgz<http://www.mellanox.com/page/mlnx_ofed_eula?mtag=linux_sw_drivers&mrequest=downloads&mtype=ofed&mver=MLNX_OFED-2.0-3.0.0&mname=MLNX_OFED_LINUX-2.0-3.0.0-rhel6.4-x86_64.tgz>
> ).
> It seems, that in contrast to 1.5 ofed2.0 does not deliver multicast
> messages to the senders receive queue, thus preventing some software
> (namely corosync) from proper functioning. Test case application is
> attached to this message, it is cutted from the corosync stack and modified
> a bit.
> To compile: gcc -o test  coropoll.c  totemiba.c  totemip.c  util.c
> -libverbs -lrdmacm -lrt
> run: ./test -i <ip_address>, where ip_address is ib_ipoib address of IB
> HCA.
>
> On RHEL 6.4. distributed IB sw:
> root at ar03 ofed2test]# ./test -i 172.32.32.13
> addr: 172.32.32.13
> family 2
> initialize:
> pollrun
> mcast_bind
> mcast_rdma_event_fn ADDR_RESOLVED
> mcast_rdma_event_fn MULTICAST_JOIN
> iface_change_fn
> calling in send_fn
> in totemiba_mcast_flush_send: called ibv_post_send with res=0, msg_len=9;,
> qp_num=117, qkey=1234567
> mcast_cq_send_event_fn, res of ibv_poll_cq=1
> in mcast_cq_recv_event_fn res=1
> in mcast_cq_recv_event_fn calling iba_deliver_fn, wc[0].byte_len=49
> IN iba_deliver_fn calling main_deliver_fn with bytes=49
> deliver_fn
> Received message asdfasdf
>
>
> On ofed 2.0:
> addr: 172.32.32.12
> family 2
> initialize:
> pollrun
> mcast_bind
> mcast_rdma_event_fn ADDR_RESOLVED
> mcast_rdma_event_fn MULTICAST_JOIN
> iface_change_fn
> calling in send_fn
> in totemiba_mcast_flush_send: called ibv_post_send with res=0, msg_len=9;,
> qp_num=93, qkey=1234567
> mcast_cq_send_event_fn, res of ibv_poll_cq=1
>
> uname -a
> Linux ar02 2.6.32-358.18.1.el6.x86_64 #1 SMP Tue Aug 27 14:23:09 CDT 2013
> x86_64 x86_64 x86_64 GNU/Linux
> HCA: mellanox ConnectX2, fw 2.9.1000
>
> It seems that recv_comp_channel->fd is never touched and message is not
> delivered to _sender_ (it is delivered to other hcas in multicast group
> though).
> I could not find any documentation regarding this case (delivering mcast
> message to senders receive queue), so i am not sure which behaviour is
> correct.
>
> I'd very appreciate if someone could run this test on his machine and
> confirm/disaffirm the problem. Also, it would be nice to know if multicast
> message has to be queued to _sender's_ receive queue after sending.
>
> Regards,
>
> --
> Ing. Yevheniy Demchenko
> Senior Linux Administrator
> UVT s.r.o.
>
>
> _______________________________________________
> Users mailing list
> Users at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/users/attachments/20130911/02b586db/attachment.html>


More information about the Users mailing list