<html>
<head>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">This is it! Works as expected after
setting block_loopback=0<br>
Thank you very much!<br>
Regards,<br>
<pre class="moz-signature" cols="72">Ing. Yevheniy Demchenko
Senior Linux Administrator
UVT s.r.o. </pre>
On 09/11/2013 03:13 PM, Hal Rosenstock wrote:<br>
</div>
<blockquote
cite="mid:CAKzyTsyiGJS0_Sixt6AGfJSwM6X1CY-+adT3ARw+OWp0RbXOLA@mail.gmail.com"
type="cite">
<div dir="ltr">
<div>Hi Yevheniy,</div>
<div> </div>
<div><font color="#000000" face="Times New Roman" size="3">
</font>
<p style="margin:0in 0in 0pt"><span
style="color:rgb(31,73,125);font-family:"Calibri","sans-serif";font-size:11pt">In
MOFED mlx4_core, there is a module parameter that controls
the
loopback and by default it’s blocked:</span></p>
<p style="margin:0in 0in 0pt"><span
style="color:rgb(31,73,125);font-family:"Calibri","sans-serif";font-size:11pt"></span> </p>
<font color="#000000" face="Times New Roman" size="3">
</font>
<p style="margin:0in 0in 0pt"><span
style="color:rgb(31,73,125);font-family:"Calibri","sans-serif";font-size:11pt">parm:
block_loopback:Block
multicast loopback packets if > 0 (default: 1) (int)</span></p>
<font color="#000000" face="Times New Roman" size="3">
</font></div>
<div> </div>
<div>Would you retry with mlx4_core block_loopback module param
set to 0 and see if that helps ?</div>
<div> </div>
<div>Thanks.</div>
<div> </div>
<div>-- Hal</div>
</div>
<div class="gmail_extra"><br>
<br>
<div class="gmail_quote">On Mon, Sep 9, 2013 at 11:36 PM,
Yevheniy Demchenko <span dir="ltr"><<a
moz-do-not-send="true" href="mailto:zheka@uvt.cz"
target="_blank">zheka@uvt.cz</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"> Hi!<br>
Recently we've run into bunch of problems with cluster
software (RHCS) after installing latest OFED 2.0 software
from Mellanox (<a moz-do-not-send="true"
href="http://www.mellanox.com/page/mlnx_ofed_eula?mtag=linux_sw_drivers&mrequest=downloads&mtype=ofed&mver=MLNX_OFED-2.0-3.0.0&mname=MLNX_OFED_LINUX-2.0-3.0.0-rhel6.4-x86_64.tgz"
target="_blank">MLNX_OFED_LINUX-2.0-3.0.0-rhel6.4-x86_64.tgz</a>).<br>
It seems, that in contrast to 1.5 ofed2.0 does not deliver
multicast messages to the senders receive queue, thus
preventing some software (namely corosync) from proper
functioning. Test case application is attached to this
message, it is cutted from the corosync stack and modified
a bit.<br>
To compile: gcc -o test coropoll.c totemiba.c
totemip.c util.c -libverbs -lrdmacm -lrt<br>
run: ./test -i <ip_address>, where ip_address is
ib_ipoib address of IB HCA.<br>
<br>
On RHEL 6.4. distributed IB sw:<br>
root@ar03 ofed2test]# ./test -i 172.32.32.13<br>
addr: 172.32.32.13<br>
family 2<br>
initialize:<br>
pollrun<br>
mcast_bind<br>
mcast_rdma_event_fn ADDR_RESOLVED<br>
mcast_rdma_event_fn MULTICAST_JOIN<br>
iface_change_fn<br>
calling in send_fn<br>
in totemiba_mcast_flush_send: called ibv_post_send with
res=0, msg_len=9;, qp_num=117, qkey=1234567<br>
mcast_cq_send_event_fn, res of ibv_poll_cq=1<br>
in mcast_cq_recv_event_fn res=1<br>
in mcast_cq_recv_event_fn calling iba_deliver_fn,
wc[0].byte_len=49<br>
IN iba_deliver_fn calling main_deliver_fn with bytes=49<br>
deliver_fn<br>
Received message asdfasdf<br>
<br>
<br>
On ofed 2.0:<br>
addr: 172.32.32.12<br>
family 2<br>
initialize:<br>
pollrun<br>
mcast_bind<br>
mcast_rdma_event_fn ADDR_RESOLVED<br>
mcast_rdma_event_fn MULTICAST_JOIN<br>
iface_change_fn<br>
calling in send_fn<br>
in totemiba_mcast_flush_send: called ibv_post_send with
res=0, msg_len=9;, qp_num=93, qkey=1234567<br>
mcast_cq_send_event_fn, res of ibv_poll_cq=1<br>
<br>
uname -a<br>
Linux ar02 2.6.32-358.18.1.el6.x86_64 #1 SMP Tue Aug 27
14:23:09 CDT 2013 x86_64 x86_64 x86_64 GNU/Linux<br>
HCA: mellanox ConnectX2, fw 2.9.1000<br>
<br>
It seems that recv_comp_channel->fd is never touched
and message is not delivered to _sender_ (it is delivered
to other hcas in multicast group though).<br>
I could not find any documentation regarding this case
(delivering mcast message to senders receive queue), so i
am not sure which behaviour is correct.<br>
<br>
I'd very appreciate if someone could run this test on his
machine and confirm/disaffirm the problem. Also, it would
be nice to know if multicast message has to be queued to
_sender's_ receive queue after sending.<br>
<br>
Regards,<span class="HOEnZb"><font color="#888888"><br>
<br>
<pre cols="72">--
Ing. Yevheniy Demchenko
Senior Linux Administrator
UVT s.r.o. </pre>
</font></span></div>
<br>
_______________________________________________<br>
Users mailing list<br>
<a moz-do-not-send="true"
href="mailto:Users@lists.openfabrics.org">Users@lists.openfabrics.org</a><br>
<a moz-do-not-send="true"
href="http://lists.openfabrics.org/cgi-bin/mailman/listinfo/users"
target="_blank">http://lists.openfabrics.org/cgi-bin/mailman/listinfo/users</a><br>
<br>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
</body>
</html>