<div><div>Turning on this debug gives this error</div><div><br></div><div>[root@cfd-io-0001 ~]# echo 1 > /sys/module/ib_ipoib/parameters/macast_debug_level</div><div>-bash: /sys/module/ib_ipoib/parameters/macast_debug_level: Permission denied</div>
<div><br></div></div><div>dmesg output after i start the netperf test (which doesn't complete)</div><div><br></div><div>mtnic 0000:02:00.0: Port 2 - link up</div><div>mtnic 0000:02:00.0: Port 2 - link down</div><div>mtnic 0000:02:00.0: Freed 1 uncompleted tx descriptors</div>
<div>mtnic 0000:02:00.0: Port 2 - link up</div><div>ib0: mtu > 2044 will cause multicast packet drops.</div><div>eth5: no IPv6 routers present</div><div>ib0: Send unicast ARP to 0384</div><div>ib0: REQ arrived</div><div>
ib0: Request connection 0x2c0406 for gid fe80:0000:0000:0000:0002:c903:0000:c36d qpn 0x48</div><div>ib0: REP received.</div><div>ib0: Send unicast ARP to 0384</div><div><br></div><br><div class="gmail_quote">On Thu, Jul 10, 2008 at 7:46 AM, Michael Di Domenico <<a href="mailto:mdidomenico4@gmail.com">mdidomenico4@gmail.com</a>> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">Do you want the debug from the IO, Compute, or Both?<div><div></div><div class="Wj3C7c"><br><br><div class="gmail_quote">
On Thu, Jul 10, 2008 at 7:01 AM, Eli Cohen <<a href="mailto:eli@dev.mellanox.co.il" target="_blank">eli@dev.mellanox.co.il</a>> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div><div></div><div>On Thu, Jul 10, 2008 at 04:30:11AM -0400, Michael Di Domenico wrote:<br>
> I'm having a bit of a weird problem that i cannot figure out. If anyone can<br>
> help from the community it would be appreciated.<br>
> Here's the packet flow<br>
><br>
> cn(ib0)->io(ib0)->io(eth5)->pan(*)<br>
><br>
> cn = compute node<br>
> io = io node<br>
> pan = panasas storage network<br>
><br>
> We have 12 shelves of panasas network storage on a seperate network, which<br>
> is being fronted by bridge servers which are routing IPoIB traffic to 10G<br>
> ethernet traffic. We're using Mellanox Connect-X Ethernet/IB adapters<br>
> everwhere. We're running Ofed 1.3.1 and the latest firmwares for IB/Eth<br>
> everywhere.<br>
><br>
> Here's the problem. I can mount the storage on the compute nodes, but if i<br>
> try to send anything more then 50MB of data via dd. I seem to loose the ARP<br>
> entries for the compute nodes on the IO servers. This seems to happen<br>
> whether I use the filesystem or a netperf run from the compute node to the<br>
> panasas storage<br>
><br>
> I can run netperf between the compute node and io node and get full IPoIB<br>
> line rate with no issues<br>
> I can run netperf between the io node and the panasas storage and get full<br>
> 10G ethernet line rate with no issues<br>
><br>
> When looking at the TCP traces, i can clearly see that a big chunk of data<br>
> is sent between the end-points and then it stalls. Immediately after the<br>
> stall is an ARP request and then another chunk of data, and this scenario<br>
> repeats over and over.<br>
><br>
> Any thoughts or questions?<br>
><br>
<br>
</div></div>Michael,<br>
could you repeat the experiment with debugging enabled? For IPoIB, this<br>
can be done as follows:<br>
<br>
echo 1 > /sys/module/ib_ipoib/parameters/debug_level<br>
echo 1 > /sys/module/ib_ipoib/parameters/macast_debug_level<br>
<br>
Please send the output of dmesg after the failure.<br>
<br>
Thanks.<br>
</blockquote></div><br>
</div></div></blockquote></div><br>