[Openib-windows] Win IBhost stop receive broadcast packets
Anatoly Lisenko
anatolyl at voltaire.com
Tue Jan 2 03:56:50 PST 2007
Hi ,
I saw some problem with windows ibhost stack: reboot of infiniband
switch can cause ping loss ( even after ibsw get up ).
I start to research this anomaly and I saw:
1. ib stack doesn't receive broadcast arp packets.
2. All other packets unicast + multicast are received.
3. rx packets hca port counter increased each time broadcast packet
arrived
4. It seems that firmware drop this packet. ( I don't see any
completions )
I examined the logs and saw that somehow we fall into state when :
1. hca's port joined to bcast group
2. ipoib qp detached from bcast group
This is stack backtrace of mlnx_detach_mcast func. :
f7125d10 f68ae6ab mthca!mlnx_detach_mcast+0x13
[n:\win-ibhost\trunk\hw\mthca\kernel\hca_mcast.c @ 142]
f7125d38 f68c9c30 ibbus!__cleanup_mcast+0x24b
[n:\win-ibhost\trunk\core\al\al_mcast.c @ 304]
f7125d70 f6820212 ibbus!async_destroy_cb+0x420
[n:\win-ibhost\trunk\core\al\al_common.c @ 665]
f7125d8c f6825dc2 ibbus!__cl_async_proc_worker+0x92
[n:\win-ibhost\trunk\core\complib\cl_async_proc.c @ 153]
f7125da0 f6827c3a ibbus!__cl_thread_pool_routine+0x52
[n:\win-ibhost\trunk\core\complib\cl_threadpool.c @ 67]
f7125dac 80948bb2 ibbus!__thread_callback+0x2a
[n:\win-ibhost\trunk\core\complib\kernel\cl_thread.c @ 49]
f7125ddc 8088d4d2 nt!PspSystemThreadStartup+0x2e
00000000 00000000 nt!KiThreadStartup+0x16
Mthca wpp log:
00000662 kernel 1236 600 2 312
01\02\2007-13:28:02:781 mlnx_query_ca()===>
00000663 kernel 1236 600 2 321
01\02\2007-13:28:02:781 mlnx_query_ca() :port 0 gid0:
00000664 kernel 1236 600 2 322
01\02\2007-13:28:02:781 mlnx_query_ca() :
0xfe80000000-0x08f14398095
00000665 kernel 1236 600 2 323
01\02\2007-13:28:02:781 mlnx_query_ca() :port 1 gid0:
00000666 kernel 1236 600 2 324
01\02\2007-13:28:02:781 mlnx_query_ca() :
0xfe80000000-0x08f14398096
00000667 kernel 1236 600 2 325
01\02\2007-13:28:02:781 mlnx_query_ca() :Space required 1898
used 1898
00000668 kernel 1236 600 2 326
01\02\2007-13:28:02:781 mlnx_conv_hca_cap() :Port 1 port_guid
0x8f10403980095
00000669 kernel 1236 600 2 327
01\02\2007-13:28:02:781 mlnx_conv_hca_cap() :Port 2 port_guid
0x8f10403980096
00000670 kernel 1236 600 2 328
01\02\2007-13:28:02:781 mlnx_query_ca()<===
00000671 kernel 4 276 2 339
01\02\2007-13:28:02:859 mlnx_attach_mcast()===>
00000672 kernel 4 276 2 340
01\02\2007-13:28:02:859 mlnx_attach_mcast() :mcasth 89930EA8,
qp_p 88A56E78, mlid c0, mgid ffff1b4012ff`ffffffff00000000
00000678 kernel 4 276 2 346
01\02\2007-13:28:02:859 completes with ERROR status
IB_SUCCESS
00000681 kernel 4 276 2 349
01\02\2007-13:28:02:859 mlnx_enable_cq_notify()===>
00000682 kernel 4 276 2 350
01\02\2007-13:28:02:859 completes with ERROR status
IB_SUCCESS
00000683 kernel 4 276 2 357
01\02\2007-13:28:02:859 mlnx_attach_mcast()===>
00000684 kernel 4 276 2 358
01\02\2007-13:28:02:859 mlnx_attach_mcast() :mcasth 898F1D68,
qp_p 88A56E78, mlid 1c0, mgid ffff1b4012ff`100000000000000
00000685 kernel 4 276 2 359
01\02\2007-13:28:02:859 completes with ERROR status
IB_SUCCESS
00000686 kernel 4 276 2 362
01\02\2007-13:28:02:859 mlnx_attach_mcast()===>
00000687 kernel 4 276 2 363
01\02\2007-13:28:02:859 mlnx_attach_mcast() :mcasth 884352D8,
qp_p 88A56E78, mlid 2c0, mgid ffff051412ff`30000c280010000
00000688 kernel 4 276 2 364
01\02\2007-13:28:02:859 completes with ERROR status
IB_SUCCESS
00000689 kernel 0 0 3 129
01\02\2007-13:28:02:750 mlnx_enable_cq_notify()===>
00000690 kernel 0 0 3 130
01\02\2007-13:28:02:750 completes with ERROR status
IB_SUCCESS
...
00000776 kernel 0 0 3 373
01\02\2007-13:28:03:109 mlnx_enable_cq_notify()===>
00000777 kernel 0 0 3 374
01\02\2007-13:28:03:109 completes with ERROR status
IB_SUCCESS
00000778 kernel 4 272 3 375
01\02\2007-13:28:03:296 mlnx_detach_mcast() :mcasth 89918F40,
qp_p 88A56E78, mlid 2c0, mgid ffff051412ff`30000c280010000
00000779 kernel 4 272 3 376
01\02\2007-13:28:03:296 completes with ERROR status
IB_SUCCESS
00000780 kernel 4 272 3 377
01\02\2007-13:28:03:296 mlnx_detach_mcast() :mcasth 88DB3F00,
qp_p 88A56E78, mlid 1c0, mgid ffff1b4012ff`100000000000000
00000781 kernel 4 272 3 378
01\02\2007-13:28:03:296 completes with ERROR status
IB_SUCCESS
00000782 kernel 4 272 3 379
01\02\2007-13:28:03:296 mlnx_detach_mcast() :mcasth 88A48DA0,
qp_p 88A56E78, mlid 6c0, mgid ffff051412ff`ffffa8ff00ff0000
00000783 kernel 4 272 3 380
01\02\2007-13:28:03:296 completes with ERROR status
IB_SUCCESS
00000784 kernel 4 272 3 381
01\02\2007-13:28:03:296 mlnx_detach_mcast() :mcasth 88A94DD0,
qp_p 88A56E78, mlid c0, mgid ffff1b4012ff`ffffffff00000000
00000785 kernel 4 272 3 382
01\02\2007-13:28:03:296 completes with ERROR status
IB_SUCCESS
00000786 kernel 4 280 1 383
01\02\2007-13:28:22:781 mlnx_query_ca()===>
00000787 kernel 4 280 1 384
01\02\2007-13:28:22:781 mlnx_query_ca() :port 0 gid0:
00000788 kernel 4 280 1 385
01\02\2007-13:28:22:781 mlnx_query_ca() :
0xfe80000000-0x08f14398095
00000789 kernel 4 280 1 386
01\02\2007-13:28:22:781 mlnx_query_ca() :port 1 gid0:
00000790 kernel 4 280 1 387
01\02\2007-13:28:22:781 mlnx_query_ca() :
0xfe80000000-0x08f14398096
00000791 kernel 4 280 1 388
01\02\2007-13:28:22:781 mlnx_query_ca() :Space required 1898
used 1898
00000792 kernel 4 280 1 389
01\02\2007-13:28:22:781 mlnx_conv_hca_cap() :Port 1 port_guid
0x8f10403980095
00000793 kernel 4 280 1 390
01\02\2007-13:28:22:781 mlnx_conv_hca_cap() :Port 2 port_guid
0x8f10403980096
00000794 kernel 4 280 1 391
01\02\2007-13:28:22:781 mlnx_query_ca()<===
Ipoib wpp log:
00000130 kernel 0 0 0 130
01\02\2007-13:28:01:468 [IPoIB] :ipoib_check_for_hang():]
00000131 kernel 4 280 0 133
01\02\2007-13:28:02:781 [IPoIB] :__endpt_mgr_reset_all():[
00000132 kernel 4 280 0 134
01\02\2007-13:28:02:781 [IPoIB] :__endpt_destroying():[
00000133 kernel 4 280 0 135
01\02\2007-13:28:02:781 [IPoIB] :__endpt_destroying():]
...
00000140 kernel 4 280 0 150
01\02\2007-13:28:02:781 [IPoIB] :__endpt_destroying():[
00000141 kernel 4 280 0 151
01\02\2007-13:28:02:781 [IPoIB] :__endpt_destroying():]
00000142 kernel 4 280 0 152
01\02\2007-13:28:02:781 [IPoIB] :__endpt_mgr_reset_all():]
00000143 kernel 4 280 0 160
01\02\2007-13:28:02:781 [IPoIB] :ipoib_port_down():]
00000144 kernel 4 280 1 131
01\02\2007-13:28:02:781 [IPoIB] :__ipoib_pnp_cb() :Link DOWN!
00000145 kernel 4 280 1 132
01\02\2007-13:28:02:781 [IPoIB] :ipoib_port_down():[
00000146 kernel 4 312 1 153
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():[
00000147 kernel 4 312 1 154
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup() :Leaving
MCast group
00000148 kernel 4 312 1 164
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():]
00000149 kernel 4 312 1 165
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():[
00000150 kernel 4 312 1 166
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():]
00000151 kernel 4 280 1 170
01\02\2007-13:28:02:781 [IPoIB] :ipoib_port_up():[
00000152 kernel 4 280 1 171
01\02\2007-13:28:02:781 [IPoIB] :ipoib_port_up():]
00000153 kernel 4 308 2 145
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():[
00000154 kernel 4 308 2 147
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup() :Leaving
MCast group
00000155 kernel 4 308 2 161
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():]
00000156 kernel 4 308 2 162
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():[
00000157 kernel 4 308 2 163
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():]
00000158 kernel 4 276 2 191
01\02\2007-13:28:02:859 [IPoIB] :__bcast_cb():[
00000159 kernel 4 276 2 192
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_add_bcast():[
00000160 kernel 4 276 2 193
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_create():[
00000161 kernel 4 276 2 194
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_create():]
00000162 kernel 4 276 2 195
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast():[
00000163 kernel 4 276 2 196
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast()
:Create av for MAC: 00-00-00-00-00-00
00000164 kernel 4 276 2 197
01\02\2007-13:28:02:859 [IPoIB] :__create_mcast_av():[
00000165 kernel 4 276 2 198
01\02\2007-13:28:02:859 [IPoIB] :__create_mcast_av():]
00000166 kernel 4 276 2 199
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast():]
00000167 kernel 4 276 2 200
01\02\2007-13:28:02:859 [IPoIB]
:__endpt_mgr_insert_locked():[
00000168 kernel 4 276 2 201
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert_locked()
:insert : MAC: FF-FF-FF-FF-FF-FF
00000169 kernel 4 276 2 202
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert():[
00000170 kernel 4 276 2 203
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert():]
00000171 kernel 4 276 2 204
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_add_bcast():]
00000172 kernel 4 276 2 205
01\02\2007-13:28:02:859 [IPoIB] :__ib_mgr_activate():[
00000173 kernel 4 276 2 206
01\02\2007-13:28:02:859 [IPoIB] :__ib_mgr_activate():]
00000174 kernel 4 276 2 207
01\02\2007-13:28:02:859 [IPoIB] :ipoib_set_active():[
00000175 kernel 4 276 2 208
01\02\2007-13:28:02:859 [IPoIB] :ipoib_port_join_mcast():[
00000176 kernel 4 276 2 209
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_ref():[
00000177 kernel 4 276 2 210
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_ref() :Look for
: MAC: 01-00-5E-00-00-01
00000178 kernel 4 276 2 211
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_ref() :Failed
endpoint lookup.[IpoIB] :__endpt_mgr_ref():]
00000179 kernel 4 276 2 212
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_create():[
00000180 kernel 4 276 2 213
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_create():]
00000181 kernel 4 276 2 214
01\02\2007-13:28:02:859 [IPoIB]
:__endpt_mgr_insert_locked():[
00000182 kernel 4 276 2 215
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert_locked()
:insert : MAC: 01-00-5E-00-00-01
00000183 kernel 4 276 2 216
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert():[
00000184 kernel 4 276 2 217
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert():]
00000185 kernel 4 276 2 218
01\02\2007-13:28:02:859 [IPoIB] :ipoib_port_join_mcast():]
00000186 kernel 4 276 2 219
01\02\2007-13:28:02:859 [IPoIB] :ipoib_port_join_mcast():[
00000187 kernel 4 276 2 220
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_ref():[
00000188 kernel 4 276 2 221
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_ref() :Look for
: MAC: 01-80-C2-00-00-03
00000189 kernel 4 276 2 222
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_ref() :Failed
endpoint lookup.[IpoIB] :__endpt_mgr_ref():]
00000190 kernel 4 276 2 223
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_create():[
00000191 kernel 4 276 2 224
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_create():]
00000192 kernel 4 276 2 225
01\02\2007-13:28:02:859 [IPoIB]
:__endpt_mgr_insert_locked():[
00000193 kernel 4 276 2 226
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert_locked()
:insert : MAC: 01-80-C2-00-00-03
00000194 kernel 4 276 2 227
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert():[
00000195 kernel 4 276 2 228
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert():]
00000196 kernel 4 276 2 229
01\02\2007-13:28:02:859 [IPoIB] :ipoib_port_join_mcast():]
00000197 kernel 4 276 2 230
01\02\2007-13:28:02:859 [IPoIB] :ipoib_resume_oids():[
00000198 kernel 4 276 2 231
01\02\2007-13:28:02:859 [IPoIB] :ipoib_resume_oids():]
00000199 kernel 4 276 2 232
01\02\2007-13:28:02:859 [IPoIB] :ipoib_set_active() :Link UP!
00000200 kernel 4 276 2 233
01\02\2007-13:28:02:859 [IPoIB] :ipoib_set_active():]
00000201 kernel 4 276 2 234
01\02\2007-13:28:02:859 [IPoIB] :__bcast_cb():]
00000202 kernel 4 276 2 235
01\02\2007-13:28:02:859 [IPoIB] :__mcast_cb():[
00000203 kernel 4 276 2 236
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast():[
00000204 kernel 4 276 2 237
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast()
:Create av for MAC: 01-00-5E-00-00-01
00000205 kernel 4 276 2 238
01\02\2007-13:28:02:859 [IPoIB] :__create_mcast_av():[
00000206 kernel 4 276 2 239
01\02\2007-13:28:02:859 [IPoIB] :__create_mcast_av():]
00000207 kernel 4 276 2 240
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast():]
00000208 kernel 4 276 2 241
01\02\2007-13:28:02:859 [IPoIB] :__mcast_cb():]
00000209 kernel 4 276 2 242
01\02\2007-13:28:02:859 [IPoIB] :__mcast_cb():[
00000210 kernel 4 276 2 243
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast():[
00000211 kernel 4 276 2 244
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast()
:Create av for MAC: 01-80-C2-00-00-03
00000212 kernel 4 276 2 245
01\02\2007-13:28:02:859 [IPoIB] :__create_mcast_av():[
00000213 kernel 4 276 2 246
01\02\2007-13:28:02:859 [IPoIB] :__create_mcast_av():]
00000214 kernel 4 276 2 247
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast():]
00000215 kernel 4 276 2 248
01\02\2007-13:28:02:859 [IPoIB] :__mcast_cb():]
00000216 kernel 4 320 3 136
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():[
00000217 kernel 4 320 3 139
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():]
00000218 kernel 4 320 3 141
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():[
00000219 kernel 4 320 3 143
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():]
00000220 kernel 4 320 3 144
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():[
00000221 kernel 4 320 3 146
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup() :Leaving
MCast group
00000222 kernel 4 320 3 155
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():]
00000223 kernel 4 320 3 156
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():[
00000224 kernel 4 320 3 157
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():]
00000225 kernel 4 320 3 158
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():[
00000226 kernel 4 320 3 159
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup() :Leaving
MCast group
00000227 kernel 4 320 3 167
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():]
00000228 kernel 4 320 3 168
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():[
00000229 kernel 4 320 3 169
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():]
00000230 kernel 0 0 3 172
01\02\2007-13:28:02:781 [IPoIB] :__port_info_cb():[
00000231 kernel 0 0 3 173
01\02\2007-13:28:02:781 [IPoIB] :__endpt_mgr_add_local():[
00000232 kernel 0 0 3 174
01\02\2007-13:28:02:781 [IPoIB] :ipoib_endpt_create():[
00000233 kernel 0 0 3 175
01\02\2007-13:28:02:781 [IPoIB] :ipoib_endpt_create():]
00000234 kernel 0 0 3 176
01\02\2007-13:28:02:781 [IPoIB] :__endpt_mgr_insert():[
00000235 kernel 0 0 3 177
01\02\2007-13:28:02:781 [IPoIB] :__endpt_mgr_insert():]
00000236 kernel 0 0 3 178
01\02\2007-13:28:02:781 [IPoIB] :__endpt_mgr_add_local():]
00000237 kernel 0 0 3 179
01\02\2007-13:28:02:781 [IPoIB] :__port_info_cb() :Received
port info: link width = 2.
00000238 kernel 0 0 3 180
01\02\2007-13:28:02:781 [IPoIB] :ipoib_set_rate():[
00000239 kernel 0 0 3 181
01\02\2007-13:28:02:781 [IPoIB] :ipoib_set_rate() :Link speed
is 2.5Gs
00000240 kernel 0 0 3 182
01\02\2007-13:28:02:781 [IPoIB] :ipoib_set_rate() :Link width
is 4X
00000241 kernel 0 0 3 183
01\02\2007-13:28:02:781 [IPoIB] :ipoib_set_rate():]
00000242 kernel 0 0 3 184
01\02\2007-13:28:02:781 [IPoIB] :__port_get_bcast():[
00000243 kernel 0 0 3 185
01\02\2007-13:28:02:781 [IPoIB] :__port_get_bcast():]
00000244 kernel 0 0 3 186
01\02\2007-13:28:02:781 [IPoIB] :__port_info_cb():]
00000245 kernel 2624 2732 3 187
01\02\2007-13:28:02:781 [IPoIB] :__bcast_get_cb():[
00000246 kernel 2624 2732 3 188
01\02\2007-13:28:02:781 [IPoIB] :__port_join_bcast():[
00000247 kernel 2624 2732 3 189
01\02\2007-13:28:02:781 [IPoIB] :__port_join_bcast():]
00000248 kernel 2624 2732 3 190
01\02\2007-13:28:02:781 [IPoIB] :__bcast_get_cb():]
00000249 kernel 0 0 3 249
01\02\2007-13:28:03:109 [IPoIB] :__endpt_mgr_get_by_gid():[
00000250 kernel 0 0 3 250
01\02\2007-13:28:03:109 [IPoIB] :__endpt_mgr_get_by_gid():]
...
00000339 kernel 0 0 3 339
01\02\2007-13:28:03:109 [IPoIB] :__endpt_mgr_get_by_gid():[
00000340 kernel 0 0 3 340
01\02\2007-13:28:03:109 [IPoIB] :__endpt_mgr_get_by_gid():]
00000341 kernel 0 0 0 341
01\02\2007-13:28:03:468 [IPoIB] :ipoib_check_for_hang():[
Thanks,
Anatoly
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20070102/9e5263c3/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mthca_wpp_flags_0x400.log
Type: application/octet-stream
Size: 29023 bytes
Desc: mthca_wpp_flags_0x400.log
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20070102/9e5263c3/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ipoib_wpp_flag_0x122.log
Type: application/octet-stream
Size: 23777 bytes
Desc: ipoib_wpp_flag_0x122.log
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20070102/9e5263c3/attachment-0001.obj>
More information about the ofw
mailing list