[ofa-general] IPoIB arps disappearing
Eli Cohen
eli at dev.mellanox.co.il
Thu Jul 10 04:01:57 PDT 2008
On Thu, Jul 10, 2008 at 04:30:11AM -0400, Michael Di Domenico wrote:
> I'm having a bit of a weird problem that i cannot figure out. If anyone can
> help from the community it would be appreciated.
> Here's the packet flow
>
> cn(ib0)->io(ib0)->io(eth5)->pan(*)
>
> cn = compute node
> io = io node
> pan = panasas storage network
>
> We have 12 shelves of panasas network storage on a seperate network, which
> is being fronted by bridge servers which are routing IPoIB traffic to 10G
> ethernet traffic. We're using Mellanox Connect-X Ethernet/IB adapters
> everwhere. We're running Ofed 1.3.1 and the latest firmwares for IB/Eth
> everywhere.
>
> Here's the problem. I can mount the storage on the compute nodes, but if i
> try to send anything more then 50MB of data via dd. I seem to loose the ARP
> entries for the compute nodes on the IO servers. This seems to happen
> whether I use the filesystem or a netperf run from the compute node to the
> panasas storage
>
> I can run netperf between the compute node and io node and get full IPoIB
> line rate with no issues
> I can run netperf between the io node and the panasas storage and get full
> 10G ethernet line rate with no issues
>
> When looking at the TCP traces, i can clearly see that a big chunk of data
> is sent between the end-points and then it stalls. Immediately after the
> stall is an ARP request and then another chunk of data, and this scenario
> repeats over and over.
>
> Any thoughts or questions?
>
Michael,
could you repeat the experiment with debugging enabled? For IPoIB, this
can be done as follows:
echo 1 > /sys/module/ib_ipoib/parameters/debug_level
echo 1 > /sys/module/ib_ipoib/parameters/macast_debug_level
Please send the output of dmesg after the failure.
Thanks.
More information about the general
mailing list