I'm having a bit of a weird problem that i cannot figure out. If anyone can help from the community it would be appreciated.<div><br></div><div>Here's the packet flow</div><div><br></div><div>cn(ib0)->io(ib0)->io(eth5)->pan(*)</div>
<div><br></div><div>cn = compute node</div><div>io = io node</div><div>pan = panasas storage network</div><div><br></div><div>We have 12 shelves of panasas network storage on a seperate network, which is being fronted by bridge servers which are routing IPoIB traffic to 10G ethernet traffic. We're using Mellanox Connect-X Ethernet/IB adapters everwhere. We're running Ofed 1.3.1 and the latest firmwares for IB/Eth everywhere.</div>
<div><br></div><div>Here's the problem. I can mount the storage on the compute nodes, but if i try to send anything more then 50MB of data via dd. I seem to loose the ARP entries for the compute nodes on the IO servers. This seems to happen whether I use the filesystem or a netperf run from the compute node to the panasas storage</div>
<div><br></div><div>I can run netperf between the compute node and io node and get full IPoIB line rate with no issues</div><div>I can run netperf between the io node and the panasas storage and get full 10G ethernet line rate with no issues</div>
<div><br></div><div>When looking at the TCP traces, i can clearly see that a big chunk of data is sent between the end-points and then it stalls. Immediately after the stall is an ARP request and then another chunk of data, and this scenario repeats over and over.</div>
<div><br></div><div>Any thoughts or questions?</div><div><br></div><div>Thanks</div><div>- Michael</div>