[ofa-general] IPoIB arps disappearing

Eli Cohen eli at dev.mellanox.co.il
Thu Jul 10 04:01:57 PDT 2008


On Thu, Jul 10, 2008 at 04:30:11AM -0400, Michael Di Domenico wrote:
> I'm having a bit of a weird problem that i cannot figure out.  If anyone can
> help from the community it would be appreciated.
> Here's the packet flow
> 
> cn(ib0)->io(ib0)->io(eth5)->pan(*)
> 
> cn = compute node
> io = io node
> pan = panasas storage network
> 
> We have 12 shelves of panasas network storage on a seperate network, which
> is being fronted by bridge servers which are routing IPoIB traffic to 10G
> ethernet traffic.  We're using Mellanox Connect-X Ethernet/IB adapters
> everwhere.  We're running Ofed 1.3.1 and the latest firmwares for IB/Eth
> everywhere.
> 
> Here's the problem.  I can mount the storage on the compute nodes, but if i
> try to send anything more then 50MB of data via dd.  I seem to loose the ARP
> entries for the compute nodes on the IO servers.  This seems to happen
> whether I use the filesystem or a netperf run from the compute node to the
> panasas storage
> 
> I can run netperf between the compute node and io node and get full IPoIB
> line rate with no issues
> I can run netperf between the io node and the panasas storage and get full
> 10G ethernet line rate with no issues
> 
> When looking at the TCP traces, i can clearly see that a big chunk of data
> is sent between the end-points and then it stalls.  Immediately after the
> stall is an ARP request and then another chunk of data, and this scenario
> repeats over and over.
> 
> Any thoughts or questions?
> 

Michael,
could you repeat the experiment with debugging enabled? For IPoIB, this
can be done as follows:

echo 1 > /sys/module/ib_ipoib/parameters/debug_level
echo 1 > /sys/module/ib_ipoib/parameters/macast_debug_level

Please send the output of dmesg after the failure.

Thanks.



More information about the general mailing list