[openib-general] getting LOC_QP_OP_ERR with IPoIB

Or Gerlitz ogerlitz at voltaire.com
Mon Sep 4 23:43:15 PDT 2006


Hi,

While doing some work to have linux bonding driver be able to work on top
of IPoIB i have run into LOC_QP_OP_ERR with vendor (mellanox PCIX HCA) error 62.

	ib0: failed send event (status=2, wrid=52 vend_err 62)

What does this vendor error means? its the same system over which i saw the qp modify error.

There are some more problematic prints i see here which i will be happy
to get some idea on their meaning...

 ib1: dev_queue_xmit failed to requeue packet
 ib1: dev_queue_xmit failed to requeue packet

 ???

 ib1: timing out; will leak address handles
 ib1: ib_dealloc_pd failed

(the pd dealloc failure is as of the ah leak) but what is the leak cause ???

Below is a more detailed snapshot of the time the problems has occured, I was
playing with this HCA 2 IB links, getting one of down for about 45 seconds (by
some instrumentation of the SM) and then the other, etc.

The ipoib code is unchanged (other then adding the "ipoib_set_mcast_list called" print).

The bonding code was changed not to set the slave mac address but rather use the mac address
of the active slave and also override the ether_setup() settings with the active slave ones.

One thing which i think to see is that the IPoIB attempts to join the IPv4 broadcast group
even when the port IB link is down, am i correct? if yes, would it be easy to fix this?

Or.

     1	ib0: leaving MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
     2	ib0: deleting multicast group ff12:401b:ffff:0000:0000:0000:ffff:ffff
     3	ib0: starting multicast thread
     4	ib1: stopping multicast thread
     5	ib1: waiting for MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
     6	ib1: flushing multicast list
     7	ib1: leaving MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
     8	ib1: deleting multicast group ff12:401b:ffff:0000:0000:0000:ffff:ffff
     9	ib1: starting multicast thread
    10	ib0: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
    11	ib1: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
    12	ib1: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status 0)
    13	ib1: MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff AV ffff810033c103c0, LID 0xc000, SL 0
    14	ib1: successfully joined all multicast groups
    15	bonding: bond0: link status definitely down for interface ib0, disabling it
    16	bonding: bond0: making interface ib1 the new active one.
    17	ib0: ipoib_set_mcast_list called
    18	ib1: ipoib_set_mcast_list called
    19	ib0: restarting multicast task
    20	ib0: stopping multicast thread
    21	ib0: waiting for MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
    22	ib0: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status -4)
    23	ib0: starting multicast thread
    24	ib1: restarting multicast task
    25	ib1: stopping multicast thread
    26	ib1: waiting for MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
    27	ib1: adding multicast entry for mgid ff12:401b:ffff:0000:0000:0000:0000:0001
    28	ib1: starting multicast thread
    29	ib0: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
    30	ib1: joining MGID ff12:401b:ffff:0000:0000:0000:0000:0001
    31	ib1: join completion for ff12:401b:ffff:0000:0000:0000:0000:0001 (status 0)
    32	ib1: MGID ff12:401b:ffff:0000:0000:0000:0000:0001 AV ffff810037f91d00, LID 0xc001, SL 0
    33	ib1: successfully joined all multicast groups
    34	ib0: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status -110)
    35	ib0: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -110
    36	ib0: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
    37	ib0: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status -110)
    38	ib0: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -110
    39	ib0: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
    40	ib0: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status -110)
    41	ib0: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -110
    42	ib0: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
    43	ib0: stopping multicast thread
    44	ib0: waiting for MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
    45	ib0: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status -4)
    46	ib0: flushing multicast list
    47	ib0: deleting multicast group ff12:401b:ffff:0000:0000:0000:ffff:ffff
    48	ib0: starting multicast thread
    49	ib1: stopping multicast thread
    50	ib1: waiting for MGID ff12:401b:ffff:0000:0000:0000:0000:0001
    51	ib1: flushing multicast list
    52	ib1: leaving MGID ff12:401b:ffff:0000:0000:0000:0000:0001
    53	ib1: deleting multicast group ff12:401b:ffff:0000:0000:0000:0000:0001
    54	ib1: leaving MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
    55	ib1: deleting multicast group ff12:401b:ffff:0000:0000:0000:ffff:ffff
    56	ib1: starting multicast thread
    57	ib0: stopping multicast thread
    58	ib0: flushing multicast list
    59	ib0: starting multicast thread
    60	ib0: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
    61	ib1: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
    62	bonding: bond0: link status definitely down for interface ib1, disabling it
    63	ib1: ipoib_set_mcast_list called
    64	bonding: bond0: now running without any active interface !
    65	ib1: restarting multicast task
    66	ib1: stopping multicast thread
    67	ib1: waiting for MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
    68	ib1: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status -4)
    69	ib1: starting multicast thread
    70	ib1: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
    71	ib0: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status 0)
    72	ib0: MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff AV ffff810033c10d80, LID 0xc000, SL 0
    73	ib0: successfully joined all multicast groups
    74	ib1: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status 0)
    75	ib1: MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff AV ffff81000b8453c0, LID 0xc000, SL 0
    76	ib1: successfully joined all multicast groups
    77	ib1: dev_queue_xmit failed to requeue packet
    78	ib1: dev_queue_xmit failed to requeue packet
    79	bonding: bond0: link status definitely up for interface ib0.
    80	bonding: bond0: link status definitely up for interface ib1.
    81	bonding: bond0: making interface ib0 the new active one.
    82	ib0: ipoib_set_mcast_list called
    83	bonding: bond0: first active interface up!
    84	ib0: restarting multicast task
    85	ib0: stopping multicast thread
    86	ib0: waiting for MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
    87	ib0: adding multicast entry for mgid ff12:401b:ffff:0000:0000:0000:0000:0001
    88	ib0: starting multicast thread
    89	ib0: joining MGID ff12:401b:ffff:0000:0000:0000:0000:0001
    90	ib0: join completion for ff12:401b:ffff:0000:0000:0000:0000:0001 (status 0)
    91	ib0: MGID ff12:401b:ffff:0000:0000:0000:0000:0001 AV ffff81000099c340, LID 0xc001, SL 0
    92	ib0: successfully joined all multicast groups
    93	ib0: failed send event (status=2, wrid=52 vend_err 62)
    94	ib0: ipoib_set_mcast_list called
    95	ib0: restarting multicast task
    96	ib0: stopping multicast thread
    97	ib0: waiting for MGID ff12:401b:ffff:0000:0000:0000:0000:0001
    98	ib0: deleting multicast group ff12:401b:ffff:0000:0000:0000:0000:0001
    99	ib0: leaving MGID ff12:401b:ffff:0000:0000:0000:0000:0001
   100	ib0: deleting multicast group ff12:401b:ffff:0000:0000:0000:0000:0001
   101	ib0: starting multicast thread
   102	ib0: successfully joined all multicast groups
   103	ib0: stopping multicast thread
   104	ib0: flushing multicast list
   105	ib0: leaving MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
   106	ib0: deleting multicast group ff12:401b:ffff:0000:0000:0000:ffff:ffff
   107	ib1: stopping multicast thread
   108	ib1: waiting for MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
   109	ib1: flushing multicast list
   110	ib1: leaving MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
   111	ib1: deleting multicast group ff12:401b:ffff:0000:0000:0000:ffff:ffff
   112	ib1: timing out; will leak address handles
   113	bonding: bond0: released all slaves
   114	ib0: stopping multicast thread
   115	ib0: flushing multicast list
   116	ib1: stopping multicast thread
   117	ib1: flushing multicast list
   118	ib1: ib_dealloc_pd failed




More information about the general mailing list