Do you want the debug from the IO, Compute, or Both?<br><br><div class="gmail_quote">On Thu, Jul 10, 2008 at 7:01 AM, Eli Cohen <<a href="mailto:eli@dev.mellanox.co.il">eli@dev.mellanox.co.il</a>> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<div><div></div><div class="Wj3C7c">On Thu, Jul 10, 2008 at 04:30:11AM -0400, Michael Di Domenico wrote:<br>
> I'm having a bit of a weird problem that i cannot figure out. If anyone can<br>
> help from the community it would be appreciated.<br>
> Here's the packet flow<br>
><br>
> cn(ib0)->io(ib0)->io(eth5)->pan(*)<br>
><br>
> cn = compute node<br>
> io = io node<br>
> pan = panasas storage network<br>
><br>
> We have 12 shelves of panasas network storage on a seperate network, which<br>
> is being fronted by bridge servers which are routing IPoIB traffic to 10G<br>
> ethernet traffic. We're using Mellanox Connect-X Ethernet/IB adapters<br>
> everwhere. We're running Ofed 1.3.1 and the latest firmwares for IB/Eth<br>
> everywhere.<br>
><br>
> Here's the problem. I can mount the storage on the compute nodes, but if i<br>
> try to send anything more then 50MB of data via dd. I seem to loose the ARP<br>
> entries for the compute nodes on the IO servers. This seems to happen<br>
> whether I use the filesystem or a netperf run from the compute node to the<br>
> panasas storage<br>
><br>
> I can run netperf between the compute node and io node and get full IPoIB<br>
> line rate with no issues<br>
> I can run netperf between the io node and the panasas storage and get full<br>
> 10G ethernet line rate with no issues<br>
><br>
> When looking at the TCP traces, i can clearly see that a big chunk of data<br>
> is sent between the end-points and then it stalls. Immediately after the<br>
> stall is an ARP request and then another chunk of data, and this scenario<br>
> repeats over and over.<br>
><br>
> Any thoughts or questions?<br>
><br>
<br>
</div></div>Michael,<br>
could you repeat the experiment with debugging enabled? For IPoIB, this<br>
can be done as follows:<br>
<br>
echo 1 > /sys/module/ib_ipoib/parameters/debug_level<br>
echo 1 > /sys/module/ib_ipoib/parameters/macast_debug_level<br>
<br>
Please send the output of dmesg after the failure.<br>
<br>
Thanks.<br>
</blockquote></div><br>