[ofa-general] IPoIB arps disappearing

Michael Di Domenico mdidomenico4 at gmail.com
Thu Jul 10 06:49:54 PDT 2008


tests using connected mode IPoIB
dmesg from the compute node

cfd-cnsl-0001:~ # dmesg
ib0: Start path record lookup for fe80:0000:0000:0000:00e0:8111:0100:0091
MTU > 0
ib0: PathRec LID 0x0518 for GID fe80:0000:0000:0000:00e0:8111:0100:0091
ib0: Created ah ffff810216a8c740
ib0: created address handle ffff81041c6533c0 for LID 0x0518, SL 0
ib0: Request connection 0x4a for gid fe80:0000:0000:0000:00e0:8111:0100:0091
qpn 0x404
ib0: REP received.
ib0: REQ arrived
ib0: Send unicast ARP to 0518
ib0: Send unicast ARP to 0518
ib0: Send unicast ARP to 0518
ib0: Send unicast ARP to 0518
ib0: Send unicast ARP to 0518
ib0: Start path record lookup for fe80:0000:0000:0000:00e0:8111:0100:007d
MTU > 0
ib0: PathRec LID 0x0161 for GID fe80:0000:0000:0000:00e0:8111:0100:007d
ib0: Created ah ffff810216a8c580
ib0: created address handle ffff81041e564d40 for LID 0x0161, SL 0
ib0: Send unicast ARP to 0161
ib0: REQ arrived
ib0: Send unicast ARP to 0161
ib0: REQ arrived
ib0: Send unicast ARP to 0161
ib0: Send unicast ARP to 0518
ib0: REQ arrived
ib0: Send unicast ARP to 0518
ib0: Send unicast ARP to 0518
ib0: Send unicast ARP to 0161
ib0: REQ arrived
ib0: DREQ received.
ib0: CM error 9.
ib0: Destroy active connection 0x4a head 0x19680 tail 0x19680


dmesg from the io node

[root at cfd-io-0001 ~]# dmesg
ib0: Start path record lookup for fe80:0000:0000:0000:0002:c903:0000:c36d
MTU > 0
ib0: PathRec LID 0x0384 for GID fe80:0000:0000:0000:0002:c903:0000:c36d
ib0: Created ah ffff810126e93500
ib0: created address handle ffff81012b98cc80 for LID 0x0384, SL 0
ib0: Send unicast ARP to 0384
ib0: REQ arrived
ib0: Request connection 0x10406 for gid
fe80:0000:0000:0000:0002:c903:0000:c36d qpn 0x48
ib0: REP received.
ib0: Send unicast ARP to 0384
ib0: Send unicast ARP to 0384
ib0: Send unicast ARP to 0384
ib0: Send unicast ARP to 0384
ib0: Send unicast ARP to 0384
ib0: Send unicast ARP to 045e
ib0: REQ arrived
ib0: neigh_destructor for bonding device: ib0
ib0: neigh_cleanup for 000048 fe80:0000:0000:0000:0002:c903:0000:c36d
ib0: Reap connection for gid fe80:0000:0000:0000:0002:c903:0000:c36d
ib0: Destroy active connection 0x10406 head 0x12455 tail 0x12455
ib0: Request connection 0x30406 for gid
fe80:0000:0000:0000:0002:c903:0000:cad1 qpn 0x48
ib0: REP received.
ib0: Request connection 0x30407 for gid
fe80:0000:0000:0000:0002:c903:0000:c36d qpn 0x48
ib0: REP received.
ib0: neigh_destructor for bonding device: ib0
ib0: neigh_cleanup for 000048 fe80:0000:0000:0000:0002:c903:0000:cad1
ib0: Reap connection for gid fe80:0000:0000:0000:0002:c903:0000:cad1
ib0: Destroy active connection 0x30406 head 0x8 tail 0x8
ib0: Request connection 0x40406 for gid
fe80:0000:0000:0000:0002:c903:0000:cad1 qpn 0x48
ib0: REP received.
ib0: Send unicast ARP to 0384
ib0: Send unicast ARP to 0384
ib0: neigh_destructor for bonding device: ib0
ib0: neigh_cleanup for 000048 fe80:0000:0000:0000:0002:c903:0000:cad1
ib0: Reap connection for gid fe80:0000:0000:0000:0002:c903:0000:cad1
ib0: Destroy active connection 0x40406 head 0x8 tail 0x8
ib0: neigh_destructor for bonding device: ib0
ib0: neigh_cleanup for 000048 fe80:0000:0000:0000:0002:c903:0000:c36d
ib0: Reap connection for gid fe80:0000:0000:0000:0002:c903:0000:c36d
ib0: Destroy active connection 0x30407 head 0x3 tail 0x3
ib0: Send unicast ARP to 045e
ib0: REQ arrived
ib0: Request connection 0x80405 for gid
fe80:0000:0000:0000:0002:c903:0000:cad1 qpn 0x48
ib0: REQ arrived
ib0: REP received.
ib0: Send unicast ARP to 045e
ib0: REQ arrived
ib0: Send unicast ARP to 045e


On Thu, Jul 10, 2008 at 9:21 AM, Michael Di Domenico <mdidomenico4 at gmail.com>
wrote:

> tests using datagram IPoIB (non-connected mode)
> dmesg from the compute node
> cfd-cnsl-0001:~ # dmesg
> ib0: Send unicast ARP to 0518
> ib0: Send unicast ARP to 0518
> ib0: Send unicast ARP to 0518
> ib0: Send unicast ARP to 0518
> ib0: Start path record lookup for fe80:0000:0000:0000:00e0:8111:0100:007d
> MTU > 1024
> ib0: PathRec LID 0x0161 for GID fe80:0000:0000:0000:00e0:8111:0100:007d
> ib0: Created ah ffff81042063dc80
> ib0: created address handle ffff8102206144c0 for LID 0x0161, SL 0
> ib0: Send unicast ARP to 0161
> ib0: Send unicast ARP to 0161
> ib0: Send unicast ARP to 0161
> ib0: Send unicast ARP to 0518
> ib0: Send unicast ARP to 0518
> ib0: Send unicast ARP to 0518
> ib0: Send unicast ARP to 0161
> ib0: neigh_destructor for bonding device: ib0
> ib0: neigh_cleanup for 000404 fe80:0000:0000:0000:00e0:8111:0100:0091
>
>
> dmesg from the IO node
>
> [root at cfd-io-0001 ~]# dmesg
> ib0: Send unicast ARP to 0384
> ib0: Send unicast ARP to 0384
> ib_mthca 0000:07:00.0: too many gathers
> ib0: post_send failed
> ib_mthca 0000:07:00.0: opcode invalid
> ib0: post_send failed
> ib_mthca 0000:07:00.0: opcode invalid
> ib0: post_send failed
> ib_mthca 0000:07:00.0: opcode invalid
> ib0: post_send failed
> ib_mthca 0000:07:00.0: opcode invalid
> ib0: post_send failed
> ib_mthca 0000:07:00.0: opcode invalid
> ib0: post_send failed
> ib_mthca 0000:07:00.0: opcode invalid
> ib0: post_send failed
> ib_mthca 0000:07:00.0: opcode invalid
> ib0: post_send failed
> ib0: Send unicast ARP to 0384
> ib_mthca 0000:07:00.0: opcode invalid
> ib0: post_send failed
> ib0: Send unicast ARP to 045e
> ib0: REQ arrived
> ib0: Send unicast ARP to 0384
> ib0: Send unicast ARP to 045e
> ib_mthca 0000:07:00.0: opcode invalid
> ib0: post_send failed
> ib0: Send unicast ARP to 0384
> ib0: neigh_destructor for bonding device: ib0
> ib0: neigh_cleanup for 000048 fe80:0000:0000:0000:0002:c903:0000:cad1
> ib0: neigh_destructor for bonding device: ib0
> ib0: neigh_cleanup for 000048 fe80:0000:0000:0000:0002:c903:0000:c36d
> ib0: neigh_destructor for bonding device: ib0
> ib0: neigh_cleanup for 000048 fe80:0000:0000:0000:0002:c903:0000:cad1
> ib0: REQ arrived
> ib0: neigh_destructor for bonding device: ib0
> ib0: neigh_cleanup for 000048 fe80:0000:0000:0000:0002:c903:0000:cad1
> ib_mthca 0000:07:00.0: opcode invalid
> ib0: post_send failed
> ib0: neigh_destructor for bonding device: ib0
> ib0: neigh_cleanup for 000048 fe80:0000:0000:0000:0002:c903:0000:c36d
> ib_mthca 0000:07:00.0: opcode invalid
> ib0: post_send failed
> ib_mthca 0000:07:00.0: opcode invalid
> ib0: post_send failed
> ib0: Send unicast ARP to 0384
> ib0: neigh_destructor for bonding device: ib0
> ib0: neigh_cleanup for 000048 fe80:0000:0000:0000:0002:c903:0000:c36d
> ib0: Send unicast ARP to 045e
> ib0: neigh_destructor for bonding device: ib0
> ib0: neigh_cleanup for 000048 fe80:0000:0000:0000:0002:c903:0000:cad1
>
>
> On Thu, Jul 10, 2008 at 8:33 AM, Eli Cohen <eli at dev.mellanox.co.il> wrote:
>
>> On Thu, Jul 10, 2008 at 07:57:30AM -0400, Michael Di Domenico wrote:
>> > maybe i spoke too soon, so more output came, i thought it was done
>> > ib0: mtu > 2044 will cause multicast packet drops.
>> > eth5: no IPv6 routers present
>> > ib0: Send unicast ARP to 0384
>> > ib0: REQ arrived
>> > ib0: Request connection 0x2c0406 for gid
>> > fe80:0000:0000:0000:0002:c903:0000:c36d qpn 0x48
>> > ib0: REP received.
>> > ib0: Send unicast ARP to 0384
>> > ib0: Send unicast ARP to 045e
>> > ib0: REQ arrived
>> > ib0: Send unicast ARP to 0384
>> > ib0: Send unicast ARP to 0384
>> > ib0: Send unicast ARP to 0384
>> > ib0: Send unicast ARP to 045e
>> > ib0: REQ arrived
>> > ib0: neigh_destructor for bonding device: ib0
>> > ib0: neigh_cleanup for 000048 fe80:0000:0000:0000:0002:c903:0000:c36d
>> > ib0: Reap connection for gid fe80:0000:0000:0000:0002:c903:0000:c36d
>> > ib0: Destroy active connection 0x2c0406 head 0x22644 tail 0x22644
>> > ib0: Request connection 0x2f0406 for gid
>> > fe80:0000:0000:0000:0002:c903:0000:cad1 qpn 0x48
>> > ib0: REP received.
>> > ib0: neigh_destructor for bonding device: ib0
>> > ib0: neigh_cleanup for 000048 fe80:0000:0000:0000:0002:c903:0000:cad1
>> > ib0: Reap connection for gid fe80:0000:0000:0000:0002:c903:0000:cad1
>> > ib0: Destroy active connection 0x2f0406 head 0x4 tail 0x4
>>
>>
>> I see you're working in connected mode. Can you please do the
>> follwoing:
>>
>> 1. clear dmesg: dmesg -c
>> 2. run again, then send all the output of dmesg
>>
>> Do this for both connected and datagram modes.
>>
>> Thanks.
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080710/fb5e0e69/attachment.html>


More information about the general mailing list