[ofa-general] RE: bug 400: ipoib error messages

Scott Weitzenkamp (sweitzen) sweitzen at cisco.com
Wed Mar 14 00:26:04 PDT 2007


Sure, let me give you more detail.

I'm looping a script that does this:

  shutdown ib0 port of host #1 (via switch CLI)
  sleep 10
  bring up ib0 port of host #1 
  sleep 10
  shutdown ib1 port of host #1
  sleep 10
  bring up ib1 port of host #1 
  sleep 10
  shutdown ib0 portof host #2
  sleep 10
  bring up ib0 port of host #12
  sleep 10
  shutdown ib1 port of host #2
  sleep 10
  bring up ib1 port of host #2
  sleep 10

While this port failover script is running, I'm running netperf over
IPoIB between the 2 hosts.

Because of bug 455 (https://bugs.openfabrics.org/show_bug.cgi?id=455),
there is output in dmesg every time there is IPoIB HA failover, so that
gives a rough sense for the rate of failover.

Right now I am having a hard time getting failures to happen, I'll keep
trying.

Here's an example of several minutes of dmesg output:

ib0: enabling connected mode will cause multicast packet drops
ib1: enabling connected mode will cause multicast packet drops
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: enabling connected mode will cause multicast packet drops
ib0: enabling connected mode will cause multicast packet drops
ib0: enabling connected mode will cause multicast packet drops
ib1: enabling connected mode will cause multicast packet drops
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: enabling connected mode will cause multicast packet drops
ib0: enabling connected mode will cause multicast packet drops
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: enabling connected mode will cause multicast packet drops
ib1: enabling connected mode will cause multicast packet drops
ib1: enabling connected mode will cause multicast packet drops
ib0: enabling connected mode will cause multicast packet drops
ib0: enabling connected mode will cause multicast packet drops
ib1: enabling connected mode will cause multicast packet drops
ib1: enabling connected mode will cause multicast packet drops
ib0: enabling connected mode will cause multicast packet drops
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: enabling connected mode will cause multicast packet drops
ib1: enabling connected mode will cause multicast packet drops
ib1: enabling connected mode will cause multicast packet drops
ib0: enabling connected mode will cause multicast packet drops
ib0: enabling connected mode will cause multicast packet drops
ib1: enabling connected mode will cause multicast packet drops
ib1: enabling connected mode will cause multicast packet drops
ib0: enabling connected mode will cause multicast packet drops
ib0: enabling connected mode will cause multicast packet drops
ib1: enabling connected mode will cause multicast packet drops
ib1: enabling connected mode will cause multicast packet drops
ib0: enabling connected mode will cause multicast packet drops
ib0: enabling connected mode will cause multicast packet drops
ib1: enabling connected mode will cause multicast packet drops
ib1: enabling connected mode will cause multicast packet drops
ib0: enabling connected mode will cause multicast packet drops
ib0: enabling connected mode will cause multicast packet drops
ib1: enabling connected mode will cause multicast packet drops
ib1: enabling connected mode will cause multicast packet drops

The IPoIB failover is very slow sometimes, shown below is netperf -D
output.  IPoIB failover should ideally only take a second or two.  I'll
be filing a bug for that.

Interim result: 4355.09 10^6bits/s over 1.00 seconds
Interim result: 4371.07 10^6bits/s over 1.00 seconds
Interim result: 4370.95 10^6bits/s over 1.00 seconds
Interim result:  162.41 10^6bits/s over 26.91 seconds
Interim result: 4360.14 10^6bits/s over 1.00 seconds
Interim result: 4354.94 10^6bits/s over 1.00 seconds
Interim result: 4353.08 10^6bits/s over 1.00 seconds
Interim result: 4343.94 10^6bits/s over 1.00 seconds
Interim result: 4356.98 10^6bits/s over 1.00 seconds
Interim result: 4357.00 10^6bits/s over 1.00 seconds
Interim result: 1735.68 10^6bits/s over 2.51 seconds
Interim result: 4357.86 10^6bits/s over 1.00 seconds
Interim result: 4358.63 10^6bits/s over 1.00 seconds
Interim result: 4352.05 10^6bits/s over 1.00 seconds
Interim result: 4355.14 10^6bits/s over 1.00 seconds
Interim result: 4350.74 10^6bits/s over 1.00 seconds
Interim result: 4363.25 10^6bits/s over 1.00 seconds
Interim result:   41.46 10^6bits/s over 105.24 seconds
Interim result:  297.83 10^6bits/s over 14.50 seconds
Interim result: 4332.43 10^6bits/s over 1.00 seconds
Interim result: 4345.48 10^6bits/s over 1.00 seconds
Interim result: 4365.19 10^6bits/s over 1.00 seconds
Interim result: 4354.96 10^6bits/s over 1.00 seconds
Interim result: 4346.54 10^6bits/s over 1.00 seconds
Interim result: 4339.78 10^6bits/s over 1.00 seconds
Interim result: 1730.77 10^6bits/s over 2.51 seconds
Interim result: 4346.55 10^6bits/s over 1.00 seconds
Interim result: 4358.37 10^6bits/s over 1.00 seconds
Interim result: 4357.15 10^6bits/s over 1.00 seconds
Interim result: 4362.43 10^6bits/s over 1.00 seconds
Interim result: 4342.37 10^6bits/s over 1.00 seconds
Interim result: 4339.25 10^6bits/s over 1.00 seconds
Interim result: 4337.89 10^6bits/s over 1.00 seconds
Interim result: 4328.02 10^6bits/s over 1.00 seconds
Interim result: 4352.09 10^6bits/s over 1.00 seconds
Interim result: 4344.81 10^6bits/s over 1.00 seconds
Interim result: 4354.92 10^6bits/s over 1.00 seconds
Interim result: 4354.71 10^6bits/s over 1.00 seconds
Interim result: 1732.11 10^6bits/s over 2.51 seconds
Interim result: 4334.02 10^6bits/s over 1.00 seconds
Interim result: 4340.94 10^6bits/s over 1.00 seconds

Scott 

> -----Original Message-----
> From: Sean Hefty [mailto:sean.hefty at intel.com] 
> Sent: Tuesday, March 13, 2007 5:06 PM
> To: Scott Weitzenkamp (sweitzen); general at lists.openfabrics.org
> Subject: bug 400: ipoib error messages
> 
> {snippet from bug 400 report because I don't want to try to 
> have a discussion on
> this inside a bug report...}
> 
> IPoIB CM HA is working much better in OFED-1.2-20070311-0600. 
>  I have been
> running for a few hours flipping an IB port every 10 seconds.
> 
> I do still see some junk in dmesg, let me know if I should 
> open a new bug or
> reopen this bug.
> 
> ib1: dev_queue_xmit failed to requeue packet
> ib_mthca 0000:04:00.0: QP 000404 not found in MGM
> ib0: ib_detach_mcast failed (result = -22)
> ib0: ipoib_mcast_detach failed (result = -22)
> ib1: dev_queue_xmit failed to requeue packet
> ib1: dev_queue_xmit failed to requeue packet
> ib1: dev_queue_xmit failed to requeue packet
> ib0: dev_queue_xmit failed to requeue packet
> ib0: dev_queue_xmit failed to requeue packet
> ib0: dev_queue_xmit failed to requeue packet
> ib0: dev_queue_xmit failed to requeue packet
> ib0: dev_queue_xmit failed to requeue packet
> ib0: dev_queue_xmit failed to requeue packet
> 
> Scott, is this the start of the message log, or just a 
> snapshot?  Specifically,
> do you see ib_detach_mcast failures for ib1?  Is the 
> dev_queue_xmit the first
> error?
> 
> - Sean
> 



More information about the general mailing list