[ofa-general] RE: bug 400: ipoib error messages
Scott Weitzenkamp (sweitzen)
sweitzen at cisco.com
Wed Mar 14 00:26:04 PDT 2007
Sure, let me give you more detail.
I'm looping a script that does this:
shutdown ib0 port of host #1 (via switch CLI)
sleep 10
bring up ib0 port of host #1
sleep 10
shutdown ib1 port of host #1
sleep 10
bring up ib1 port of host #1
sleep 10
shutdown ib0 portof host #2
sleep 10
bring up ib0 port of host #12
sleep 10
shutdown ib1 port of host #2
sleep 10
bring up ib1 port of host #2
sleep 10
While this port failover script is running, I'm running netperf over
IPoIB between the 2 hosts.
Because of bug 455 (https://bugs.openfabrics.org/show_bug.cgi?id=455),
there is output in dmesg every time there is IPoIB HA failover, so that
gives a rough sense for the rate of failover.
Right now I am having a hard time getting failures to happen, I'll keep
trying.
Here's an example of several minutes of dmesg output:
ib0: enabling connected mode will cause multicast packet drops
ib1: enabling connected mode will cause multicast packet drops
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: enabling connected mode will cause multicast packet drops
ib0: enabling connected mode will cause multicast packet drops
ib0: enabling connected mode will cause multicast packet drops
ib1: enabling connected mode will cause multicast packet drops
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: enabling connected mode will cause multicast packet drops
ib0: enabling connected mode will cause multicast packet drops
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: enabling connected mode will cause multicast packet drops
ib1: enabling connected mode will cause multicast packet drops
ib1: enabling connected mode will cause multicast packet drops
ib0: enabling connected mode will cause multicast packet drops
ib0: enabling connected mode will cause multicast packet drops
ib1: enabling connected mode will cause multicast packet drops
ib1: enabling connected mode will cause multicast packet drops
ib0: enabling connected mode will cause multicast packet drops
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: enabling connected mode will cause multicast packet drops
ib1: enabling connected mode will cause multicast packet drops
ib1: enabling connected mode will cause multicast packet drops
ib0: enabling connected mode will cause multicast packet drops
ib0: enabling connected mode will cause multicast packet drops
ib1: enabling connected mode will cause multicast packet drops
ib1: enabling connected mode will cause multicast packet drops
ib0: enabling connected mode will cause multicast packet drops
ib0: enabling connected mode will cause multicast packet drops
ib1: enabling connected mode will cause multicast packet drops
ib1: enabling connected mode will cause multicast packet drops
ib0: enabling connected mode will cause multicast packet drops
ib0: enabling connected mode will cause multicast packet drops
ib1: enabling connected mode will cause multicast packet drops
ib1: enabling connected mode will cause multicast packet drops
ib0: enabling connected mode will cause multicast packet drops
ib0: enabling connected mode will cause multicast packet drops
ib1: enabling connected mode will cause multicast packet drops
ib1: enabling connected mode will cause multicast packet drops
The IPoIB failover is very slow sometimes, shown below is netperf -D
output. IPoIB failover should ideally only take a second or two. I'll
be filing a bug for that.
Interim result: 4355.09 10^6bits/s over 1.00 seconds
Interim result: 4371.07 10^6bits/s over 1.00 seconds
Interim result: 4370.95 10^6bits/s over 1.00 seconds
Interim result: 162.41 10^6bits/s over 26.91 seconds
Interim result: 4360.14 10^6bits/s over 1.00 seconds
Interim result: 4354.94 10^6bits/s over 1.00 seconds
Interim result: 4353.08 10^6bits/s over 1.00 seconds
Interim result: 4343.94 10^6bits/s over 1.00 seconds
Interim result: 4356.98 10^6bits/s over 1.00 seconds
Interim result: 4357.00 10^6bits/s over 1.00 seconds
Interim result: 1735.68 10^6bits/s over 2.51 seconds
Interim result: 4357.86 10^6bits/s over 1.00 seconds
Interim result: 4358.63 10^6bits/s over 1.00 seconds
Interim result: 4352.05 10^6bits/s over 1.00 seconds
Interim result: 4355.14 10^6bits/s over 1.00 seconds
Interim result: 4350.74 10^6bits/s over 1.00 seconds
Interim result: 4363.25 10^6bits/s over 1.00 seconds
Interim result: 41.46 10^6bits/s over 105.24 seconds
Interim result: 297.83 10^6bits/s over 14.50 seconds
Interim result: 4332.43 10^6bits/s over 1.00 seconds
Interim result: 4345.48 10^6bits/s over 1.00 seconds
Interim result: 4365.19 10^6bits/s over 1.00 seconds
Interim result: 4354.96 10^6bits/s over 1.00 seconds
Interim result: 4346.54 10^6bits/s over 1.00 seconds
Interim result: 4339.78 10^6bits/s over 1.00 seconds
Interim result: 1730.77 10^6bits/s over 2.51 seconds
Interim result: 4346.55 10^6bits/s over 1.00 seconds
Interim result: 4358.37 10^6bits/s over 1.00 seconds
Interim result: 4357.15 10^6bits/s over 1.00 seconds
Interim result: 4362.43 10^6bits/s over 1.00 seconds
Interim result: 4342.37 10^6bits/s over 1.00 seconds
Interim result: 4339.25 10^6bits/s over 1.00 seconds
Interim result: 4337.89 10^6bits/s over 1.00 seconds
Interim result: 4328.02 10^6bits/s over 1.00 seconds
Interim result: 4352.09 10^6bits/s over 1.00 seconds
Interim result: 4344.81 10^6bits/s over 1.00 seconds
Interim result: 4354.92 10^6bits/s over 1.00 seconds
Interim result: 4354.71 10^6bits/s over 1.00 seconds
Interim result: 1732.11 10^6bits/s over 2.51 seconds
Interim result: 4334.02 10^6bits/s over 1.00 seconds
Interim result: 4340.94 10^6bits/s over 1.00 seconds
Scott
> -----Original Message-----
> From: Sean Hefty [mailto:sean.hefty at intel.com]
> Sent: Tuesday, March 13, 2007 5:06 PM
> To: Scott Weitzenkamp (sweitzen); general at lists.openfabrics.org
> Subject: bug 400: ipoib error messages
>
> {snippet from bug 400 report because I don't want to try to
> have a discussion on
> this inside a bug report...}
>
> IPoIB CM HA is working much better in OFED-1.2-20070311-0600.
> I have been
> running for a few hours flipping an IB port every 10 seconds.
>
> I do still see some junk in dmesg, let me know if I should
> open a new bug or
> reopen this bug.
>
> ib1: dev_queue_xmit failed to requeue packet
> ib_mthca 0000:04:00.0: QP 000404 not found in MGM
> ib0: ib_detach_mcast failed (result = -22)
> ib0: ipoib_mcast_detach failed (result = -22)
> ib1: dev_queue_xmit failed to requeue packet
> ib1: dev_queue_xmit failed to requeue packet
> ib1: dev_queue_xmit failed to requeue packet
> ib0: dev_queue_xmit failed to requeue packet
> ib0: dev_queue_xmit failed to requeue packet
> ib0: dev_queue_xmit failed to requeue packet
> ib0: dev_queue_xmit failed to requeue packet
> ib0: dev_queue_xmit failed to requeue packet
> ib0: dev_queue_xmit failed to requeue packet
>
> Scott, is this the start of the message log, or just a
> snapshot? Specifically,
> do you see ib_detach_mcast failures for ib1? Is the
> dev_queue_xmit the first
> error?
>
> - Sean
>
More information about the general
mailing list