[openib-general] getting LOC_QP_OP_ERR with IPoIB
Or Gerlitz
ogerlitz at voltaire.com
Mon Sep 4 23:43:15 PDT 2006
Hi,
While doing some work to have linux bonding driver be able to work on top
of IPoIB i have run into LOC_QP_OP_ERR with vendor (mellanox PCIX HCA) error 62.
ib0: failed send event (status=2, wrid=52 vend_err 62)
What does this vendor error means? its the same system over which i saw the qp modify error.
There are some more problematic prints i see here which i will be happy
to get some idea on their meaning...
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
???
ib1: timing out; will leak address handles
ib1: ib_dealloc_pd failed
(the pd dealloc failure is as of the ah leak) but what is the leak cause ???
Below is a more detailed snapshot of the time the problems has occured, I was
playing with this HCA 2 IB links, getting one of down for about 45 seconds (by
some instrumentation of the SM) and then the other, etc.
The ipoib code is unchanged (other then adding the "ipoib_set_mcast_list called" print).
The bonding code was changed not to set the slave mac address but rather use the mac address
of the active slave and also override the ether_setup() settings with the active slave ones.
One thing which i think to see is that the IPoIB attempts to join the IPv4 broadcast group
even when the port IB link is down, am i correct? if yes, would it be easy to fix this?
Or.
1 ib0: leaving MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
2 ib0: deleting multicast group ff12:401b:ffff:0000:0000:0000:ffff:ffff
3 ib0: starting multicast thread
4 ib1: stopping multicast thread
5 ib1: waiting for MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
6 ib1: flushing multicast list
7 ib1: leaving MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
8 ib1: deleting multicast group ff12:401b:ffff:0000:0000:0000:ffff:ffff
9 ib1: starting multicast thread
10 ib0: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
11 ib1: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
12 ib1: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status 0)
13 ib1: MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff AV ffff810033c103c0, LID 0xc000, SL 0
14 ib1: successfully joined all multicast groups
15 bonding: bond0: link status definitely down for interface ib0, disabling it
16 bonding: bond0: making interface ib1 the new active one.
17 ib0: ipoib_set_mcast_list called
18 ib1: ipoib_set_mcast_list called
19 ib0: restarting multicast task
20 ib0: stopping multicast thread
21 ib0: waiting for MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
22 ib0: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status -4)
23 ib0: starting multicast thread
24 ib1: restarting multicast task
25 ib1: stopping multicast thread
26 ib1: waiting for MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
27 ib1: adding multicast entry for mgid ff12:401b:ffff:0000:0000:0000:0000:0001
28 ib1: starting multicast thread
29 ib0: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
30 ib1: joining MGID ff12:401b:ffff:0000:0000:0000:0000:0001
31 ib1: join completion for ff12:401b:ffff:0000:0000:0000:0000:0001 (status 0)
32 ib1: MGID ff12:401b:ffff:0000:0000:0000:0000:0001 AV ffff810037f91d00, LID 0xc001, SL 0
33 ib1: successfully joined all multicast groups
34 ib0: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status -110)
35 ib0: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -110
36 ib0: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
37 ib0: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status -110)
38 ib0: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -110
39 ib0: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
40 ib0: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status -110)
41 ib0: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -110
42 ib0: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
43 ib0: stopping multicast thread
44 ib0: waiting for MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
45 ib0: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status -4)
46 ib0: flushing multicast list
47 ib0: deleting multicast group ff12:401b:ffff:0000:0000:0000:ffff:ffff
48 ib0: starting multicast thread
49 ib1: stopping multicast thread
50 ib1: waiting for MGID ff12:401b:ffff:0000:0000:0000:0000:0001
51 ib1: flushing multicast list
52 ib1: leaving MGID ff12:401b:ffff:0000:0000:0000:0000:0001
53 ib1: deleting multicast group ff12:401b:ffff:0000:0000:0000:0000:0001
54 ib1: leaving MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
55 ib1: deleting multicast group ff12:401b:ffff:0000:0000:0000:ffff:ffff
56 ib1: starting multicast thread
57 ib0: stopping multicast thread
58 ib0: flushing multicast list
59 ib0: starting multicast thread
60 ib0: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
61 ib1: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
62 bonding: bond0: link status definitely down for interface ib1, disabling it
63 ib1: ipoib_set_mcast_list called
64 bonding: bond0: now running without any active interface !
65 ib1: restarting multicast task
66 ib1: stopping multicast thread
67 ib1: waiting for MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
68 ib1: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status -4)
69 ib1: starting multicast thread
70 ib1: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
71 ib0: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status 0)
72 ib0: MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff AV ffff810033c10d80, LID 0xc000, SL 0
73 ib0: successfully joined all multicast groups
74 ib1: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status 0)
75 ib1: MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff AV ffff81000b8453c0, LID 0xc000, SL 0
76 ib1: successfully joined all multicast groups
77 ib1: dev_queue_xmit failed to requeue packet
78 ib1: dev_queue_xmit failed to requeue packet
79 bonding: bond0: link status definitely up for interface ib0.
80 bonding: bond0: link status definitely up for interface ib1.
81 bonding: bond0: making interface ib0 the new active one.
82 ib0: ipoib_set_mcast_list called
83 bonding: bond0: first active interface up!
84 ib0: restarting multicast task
85 ib0: stopping multicast thread
86 ib0: waiting for MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
87 ib0: adding multicast entry for mgid ff12:401b:ffff:0000:0000:0000:0000:0001
88 ib0: starting multicast thread
89 ib0: joining MGID ff12:401b:ffff:0000:0000:0000:0000:0001
90 ib0: join completion for ff12:401b:ffff:0000:0000:0000:0000:0001 (status 0)
91 ib0: MGID ff12:401b:ffff:0000:0000:0000:0000:0001 AV ffff81000099c340, LID 0xc001, SL 0
92 ib0: successfully joined all multicast groups
93 ib0: failed send event (status=2, wrid=52 vend_err 62)
94 ib0: ipoib_set_mcast_list called
95 ib0: restarting multicast task
96 ib0: stopping multicast thread
97 ib0: waiting for MGID ff12:401b:ffff:0000:0000:0000:0000:0001
98 ib0: deleting multicast group ff12:401b:ffff:0000:0000:0000:0000:0001
99 ib0: leaving MGID ff12:401b:ffff:0000:0000:0000:0000:0001
100 ib0: deleting multicast group ff12:401b:ffff:0000:0000:0000:0000:0001
101 ib0: starting multicast thread
102 ib0: successfully joined all multicast groups
103 ib0: stopping multicast thread
104 ib0: flushing multicast list
105 ib0: leaving MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
106 ib0: deleting multicast group ff12:401b:ffff:0000:0000:0000:ffff:ffff
107 ib1: stopping multicast thread
108 ib1: waiting for MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
109 ib1: flushing multicast list
110 ib1: leaving MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
111 ib1: deleting multicast group ff12:401b:ffff:0000:0000:0000:ffff:ffff
112 ib1: timing out; will leak address handles
113 bonding: bond0: released all slaves
114 ib0: stopping multicast thread
115 ib0: flushing multicast list
116 ib1: stopping multicast thread
117 ib1: flushing multicast list
118 ib1: ib_dealloc_pd failed
More information about the general
mailing list