[ofa-general] IPoIB arps disappearing

Michael Di Domenico mdidomenico4 at gmail.com
Thu Jul 10 04:53:16 PDT 2008


Turning on this debug gives this error

[root at cfd-io-0001 ~]# echo 1 >
/sys/module/ib_ipoib/parameters/macast_debug_level
-bash: /sys/module/ib_ipoib/parameters/macast_debug_level: Permission denied

dmesg output after i start the netperf test (which doesn't complete)

mtnic 0000:02:00.0: Port 2 - link up
mtnic 0000:02:00.0: Port 2 - link down
mtnic 0000:02:00.0: Freed 1 uncompleted tx descriptors
mtnic 0000:02:00.0: Port 2 - link up
ib0: mtu > 2044 will cause multicast packet drops.
eth5: no IPv6 routers present
ib0: Send unicast ARP to 0384
ib0: REQ arrived
ib0: Request connection 0x2c0406 for gid
fe80:0000:0000:0000:0002:c903:0000:c36d qpn 0x48
ib0: REP received.
ib0: Send unicast ARP to 0384


On Thu, Jul 10, 2008 at 7:46 AM, Michael Di Domenico <mdidomenico4 at gmail.com>
wrote:

> Do you want the debug from the IO, Compute, or Both?
>
>
> On Thu, Jul 10, 2008 at 7:01 AM, Eli Cohen <eli at dev.mellanox.co.il> wrote:
>
>> On Thu, Jul 10, 2008 at 04:30:11AM -0400, Michael Di Domenico wrote:
>> > I'm having a bit of a weird problem that i cannot figure out.  If anyone
>> can
>> > help from the community it would be appreciated.
>> > Here's the packet flow
>> >
>> > cn(ib0)->io(ib0)->io(eth5)->pan(*)
>> >
>> > cn = compute node
>> > io = io node
>> > pan = panasas storage network
>> >
>> > We have 12 shelves of panasas network storage on a seperate network,
>> which
>> > is being fronted by bridge servers which are routing IPoIB traffic to
>> 10G
>> > ethernet traffic.  We're using Mellanox Connect-X Ethernet/IB adapters
>> > everwhere.  We're running Ofed 1.3.1 and the latest firmwares for IB/Eth
>> > everywhere.
>> >
>> > Here's the problem.  I can mount the storage on the compute nodes, but
>> if i
>> > try to send anything more then 50MB of data via dd.  I seem to loose the
>> ARP
>> > entries for the compute nodes on the IO servers.  This seems to happen
>> > whether I use the filesystem or a netperf run from the compute node to
>> the
>> > panasas storage
>> >
>> > I can run netperf between the compute node and io node and get full
>> IPoIB
>> > line rate with no issues
>> > I can run netperf between the io node and the panasas storage and get
>> full
>> > 10G ethernet line rate with no issues
>> >
>> > When looking at the TCP traces, i can clearly see that a big chunk of
>> data
>> > is sent between the end-points and then it stalls.  Immediately after
>> the
>> > stall is an ARP request and then another chunk of data, and this
>> scenario
>> > repeats over and over.
>> >
>> > Any thoughts or questions?
>> >
>>
>> Michael,
>> could you repeat the experiment with debugging enabled? For IPoIB, this
>> can be done as follows:
>>
>> echo 1 > /sys/module/ib_ipoib/parameters/debug_level
>> echo 1 > /sys/module/ib_ipoib/parameters/macast_debug_level
>>
>> Please send the output of dmesg after the failure.
>>
>> Thanks.
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080710/99c3840e/attachment.html>


More information about the general mailing list