[ofa-general] IPoIB arps disappearing

Michael Di Domenico mdidomenico4 at gmail.com
Thu Jul 10 06:21:34 PDT 2008


tests using datagram IPoIB (non-connected mode)
dmesg from the compute node
cfd-cnsl-0001:~ # dmesg
ib0: Send unicast ARP to 0518
ib0: Send unicast ARP to 0518
ib0: Send unicast ARP to 0518
ib0: Send unicast ARP to 0518
ib0: Start path record lookup for fe80:0000:0000:0000:00e0:8111:0100:007d
MTU > 1024
ib0: PathRec LID 0x0161 for GID fe80:0000:0000:0000:00e0:8111:0100:007d
ib0: Created ah ffff81042063dc80
ib0: created address handle ffff8102206144c0 for LID 0x0161, SL 0
ib0: Send unicast ARP to 0161
ib0: Send unicast ARP to 0161
ib0: Send unicast ARP to 0161
ib0: Send unicast ARP to 0518
ib0: Send unicast ARP to 0518
ib0: Send unicast ARP to 0518
ib0: Send unicast ARP to 0161
ib0: neigh_destructor for bonding device: ib0
ib0: neigh_cleanup for 000404 fe80:0000:0000:0000:00e0:8111:0100:0091


dmesg from the IO node

[root at cfd-io-0001 ~]# dmesg
ib0: Send unicast ARP to 0384
ib0: Send unicast ARP to 0384
ib_mthca 0000:07:00.0: too many gathers
ib0: post_send failed
ib_mthca 0000:07:00.0: opcode invalid
ib0: post_send failed
ib_mthca 0000:07:00.0: opcode invalid
ib0: post_send failed
ib_mthca 0000:07:00.0: opcode invalid
ib0: post_send failed
ib_mthca 0000:07:00.0: opcode invalid
ib0: post_send failed
ib_mthca 0000:07:00.0: opcode invalid
ib0: post_send failed
ib_mthca 0000:07:00.0: opcode invalid
ib0: post_send failed
ib_mthca 0000:07:00.0: opcode invalid
ib0: post_send failed
ib0: Send unicast ARP to 0384
ib_mthca 0000:07:00.0: opcode invalid
ib0: post_send failed
ib0: Send unicast ARP to 045e
ib0: REQ arrived
ib0: Send unicast ARP to 0384
ib0: Send unicast ARP to 045e
ib_mthca 0000:07:00.0: opcode invalid
ib0: post_send failed
ib0: Send unicast ARP to 0384
ib0: neigh_destructor for bonding device: ib0
ib0: neigh_cleanup for 000048 fe80:0000:0000:0000:0002:c903:0000:cad1
ib0: neigh_destructor for bonding device: ib0
ib0: neigh_cleanup for 000048 fe80:0000:0000:0000:0002:c903:0000:c36d
ib0: neigh_destructor for bonding device: ib0
ib0: neigh_cleanup for 000048 fe80:0000:0000:0000:0002:c903:0000:cad1
ib0: REQ arrived
ib0: neigh_destructor for bonding device: ib0
ib0: neigh_cleanup for 000048 fe80:0000:0000:0000:0002:c903:0000:cad1
ib_mthca 0000:07:00.0: opcode invalid
ib0: post_send failed
ib0: neigh_destructor for bonding device: ib0
ib0: neigh_cleanup for 000048 fe80:0000:0000:0000:0002:c903:0000:c36d
ib_mthca 0000:07:00.0: opcode invalid
ib0: post_send failed
ib_mthca 0000:07:00.0: opcode invalid
ib0: post_send failed
ib0: Send unicast ARP to 0384
ib0: neigh_destructor for bonding device: ib0
ib0: neigh_cleanup for 000048 fe80:0000:0000:0000:0002:c903:0000:c36d
ib0: Send unicast ARP to 045e
ib0: neigh_destructor for bonding device: ib0
ib0: neigh_cleanup for 000048 fe80:0000:0000:0000:0002:c903:0000:cad1


On Thu, Jul 10, 2008 at 8:33 AM, Eli Cohen <eli at dev.mellanox.co.il> wrote:

> On Thu, Jul 10, 2008 at 07:57:30AM -0400, Michael Di Domenico wrote:
> > maybe i spoke too soon, so more output came, i thought it was done
> > ib0: mtu > 2044 will cause multicast packet drops.
> > eth5: no IPv6 routers present
> > ib0: Send unicast ARP to 0384
> > ib0: REQ arrived
> > ib0: Request connection 0x2c0406 for gid
> > fe80:0000:0000:0000:0002:c903:0000:c36d qpn 0x48
> > ib0: REP received.
> > ib0: Send unicast ARP to 0384
> > ib0: Send unicast ARP to 045e
> > ib0: REQ arrived
> > ib0: Send unicast ARP to 0384
> > ib0: Send unicast ARP to 0384
> > ib0: Send unicast ARP to 0384
> > ib0: Send unicast ARP to 045e
> > ib0: REQ arrived
> > ib0: neigh_destructor for bonding device: ib0
> > ib0: neigh_cleanup for 000048 fe80:0000:0000:0000:0002:c903:0000:c36d
> > ib0: Reap connection for gid fe80:0000:0000:0000:0002:c903:0000:c36d
> > ib0: Destroy active connection 0x2c0406 head 0x22644 tail 0x22644
> > ib0: Request connection 0x2f0406 for gid
> > fe80:0000:0000:0000:0002:c903:0000:cad1 qpn 0x48
> > ib0: REP received.
> > ib0: neigh_destructor for bonding device: ib0
> > ib0: neigh_cleanup for 000048 fe80:0000:0000:0000:0002:c903:0000:cad1
> > ib0: Reap connection for gid fe80:0000:0000:0000:0002:c903:0000:cad1
> > ib0: Destroy active connection 0x2f0406 head 0x4 tail 0x4
>
>
> I see you're working in connected mode. Can you please do the
> follwoing:
>
> 1. clear dmesg: dmesg -c
> 2. run again, then send all the output of dmesg
>
> Do this for both connected and datagram modes.
>
> Thanks.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080710/4b21962b/attachment.html>


More information about the general mailing list