[ofa-general] IPoIB arps disappearing

Michael Di Domenico mdidomenico4 at gmail.com
Thu Jul 10 04:46:50 PDT 2008


Do you want the debug from the IO, Compute, or Both?

On Thu, Jul 10, 2008 at 7:01 AM, Eli Cohen <eli at dev.mellanox.co.il> wrote:

> On Thu, Jul 10, 2008 at 04:30:11AM -0400, Michael Di Domenico wrote:
> > I'm having a bit of a weird problem that i cannot figure out.  If anyone
> can
> > help from the community it would be appreciated.
> > Here's the packet flow
> >
> > cn(ib0)->io(ib0)->io(eth5)->pan(*)
> >
> > cn = compute node
> > io = io node
> > pan = panasas storage network
> >
> > We have 12 shelves of panasas network storage on a seperate network,
> which
> > is being fronted by bridge servers which are routing IPoIB traffic to 10G
> > ethernet traffic.  We're using Mellanox Connect-X Ethernet/IB adapters
> > everwhere.  We're running Ofed 1.3.1 and the latest firmwares for IB/Eth
> > everywhere.
> >
> > Here's the problem.  I can mount the storage on the compute nodes, but if
> i
> > try to send anything more then 50MB of data via dd.  I seem to loose the
> ARP
> > entries for the compute nodes on the IO servers.  This seems to happen
> > whether I use the filesystem or a netperf run from the compute node to
> the
> > panasas storage
> >
> > I can run netperf between the compute node and io node and get full IPoIB
> > line rate with no issues
> > I can run netperf between the io node and the panasas storage and get
> full
> > 10G ethernet line rate with no issues
> >
> > When looking at the TCP traces, i can clearly see that a big chunk of
> data
> > is sent between the end-points and then it stalls.  Immediately after the
> > stall is an ARP request and then another chunk of data, and this scenario
> > repeats over and over.
> >
> > Any thoughts or questions?
> >
>
> Michael,
> could you repeat the experiment with debugging enabled? For IPoIB, this
> can be done as follows:
>
> echo 1 > /sys/module/ib_ipoib/parameters/debug_level
> echo 1 > /sys/module/ib_ipoib/parameters/macast_debug_level
>
> Please send the output of dmesg after the failure.
>
> Thanks.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080710/e7d6a00f/attachment.html>


More information about the general mailing list