[Openib-windows] [PATCH] IPoIB response to 1x links
Yossi Leybovich
sleybo at mellanox.co.il
Mon Sep 4 08:28:23 PDT 2006
Fab
One of our verification engineers generate the following scenario.
-phy link down
-link speed set to 1x
-phy link up
-phy link down
-link speed set to 4x
-phy link up
And he got blue screen in the IPOIB .
I debug it with check version and got the following assert ( with the
prints of PNP/INIT/ENDPT)
===================================
0:[IPoIB]:__ipoib_pnp_cb() [
~0:[IPoIB]:__ipoib_pnp_cb(): p_pnp_rec->pnp_event = 0x40002
(IB_PNP_PORT_DOWN)
~0:[IPoIB]:__ipoib_pnp_cb(): Link DOWN!
~0:[IPoIB]:ipoib_port_down() [
~0:[IPoIB]:__endpt_mgr_reset_all() [
~0:[IPoIB]:__endpt_destroying() [
~0:[IPoIB]:__endpt_destroying() ]
~0:[IPoIB]:__endpt_destroying() [
~1:[IPoIB]:__endpt_cleanup() [
~0:[IPoIB]:__endpt_destroying() ]
~1:[IPoIB]:__endpt_cleanup() ]
~1:[IPoIB]:__endpt_free() [
~0:[IPoIB]:__endpt_destroying() [
~0:[IPoIB]:__endpt_destroying() ]
~0:[IPoIB]:__endpt_cleanup() [
~0:[IPoIB]:__endpt_cleanup(): ~0:[IPoIB]:__endpt_destroying() [
~0:[IPoIB]:__endpt_destroying() ]
Leaving MCast group
~0:[IPoIB]:__endpt_cleanup() ]
~0:[IPoIB]:__endpt_destroying() [
~0:[IPoIB]:__endpt_destroying() ]
~0:[IPoIB]:__endpt_free() [
~0:[IPoIB]:__endpt_free() ]
~0:[IPoIB]:__endpt_destroying() [
~0:[IPoIB]:__endpt_destroying() ]
~0:[IPoIB]:__endpt_cleanup() [
~0:[IPoIB]:__endpt_cleanup(): ~0:[IPoIB]:__endpt_mgr_reset_all() ]
~0:[IPoIB]:ipoib_port_down() ]
Leaving MCast group
~0:[IPoIB]:__endpt_cleanup() ]
~0:[IPoIB]:__ipoib_pnp_cb() ]
~0:[IPoIB]:__endpt_free() [
~0:[IPoIB]:__endpt_free() ]
~0:[IPoIB]:__endpt_cleanup() [
~0:[IPoIB]:__endpt_cleanup(): Leaving MCast group
~0:[IPoIB]:__endpt_cleanup() ]
~0:[IPoIB]:__endpt_free() [
~0:[IPoIB]:__endpt_free() ]
~0:[IPoIB]:__endpt_cleanup() [
~0:[IPoIB]:__endpt_cleanup(): Leaving MCast group
~0:[IPoIB]:__endpt_cleanup() ]
~0:[IPoIB]:__endpt_free() [
~0:[IPoIB]:__endpt_free() ]
~0:[IPoIB]:__endpt_cleanup() [
~0:[IPoIB]:__endpt_cleanup(): Leaving MCast group
~0:[IPoIB]:__endpt_cleanup() ]
~0:sa_req_send_comp_cb() !ERROR!: request failed - notifying user
~0:sa_req_send_comp_cb() !ERROR!: request failed - notifying user
~0:[IPoIB]:__endpt_free() [
~0:[IPoIB]:__endpt_free() ]
~1:[IPoIB]:__endpt_free() ]
~0:[IPoIB]:__ipoib_pnp_cb() [
~0:[IPoIB]:__ipoib_pnp_cb(): p_pnp_rec->pnp_event = 0x8002
(IB_PNP_PORT_INIT)
~0:[IPoIB]:__ipoib_pnp_cb() ]
~0:[IPoIB]:__ipoib_pnp_cb() [
~0:[IPoIB]:__ipoib_pnp_cb(): p_pnp_rec->pnp_event = 0x100002
(IB_PNP_SM_CHANGE)
~0:[IPoIB]:__ipoib_pnp_cb() ]
~0:[IPoIB]:__ipoib_pnp_cb() [
~0:[IPoIB]:__ipoib_pnp_cb(): p_pnp_rec->pnp_event = 0x20002
(IB_PNP_PORT_ACTIVE)
~0:[IPoIB]:ipoib_port_up() [
~0:[IPoIB]:ipoib_port_up() ]
~0:[IPoIB]:__ipoib_pnp_cb() ]
~1:[IPoIB]:__port_info_cb() [
~1:[IPoIB]:__endpt_mgr_add_local() [
~1:[IPoIB]:ipoib_endpt_create() [
~1:[IPoIB]:ipoib_endpt_create() ]
~1:[IPoIB]:__endpt_mgr_insert() [
~1:[IPoIB]:__endpt_mgr_insert() ]
~1:[IPoIB]:__endpt_mgr_add_local() ]
~1:[IPoIB]:__port_info_cb(): Received port info: link width = 1.
~1:[IPoIB]:ipoib_set_rate() [
~1:[IPoIB]:ipoib_set_rate(): Link speed is 2.5Gs
~1:[IPoIB]:ipoib_set_rate(): Link width is 1X
~1:[IPoIB]:ipoib_set_rate() ]
~1:[IPoIB]:__port_get_bcast() [
~0:sa_req_send_comp_cb() !ERROR!: request failed - notifying user
~1:[IPoIB]:__port_get_bcast() ]
~1:[IPoIB]:__port_info_cb() ]
~1:[IPoIB]:__bcast_get_cb() [
~1:[IPoIB]:__port_join_bcast() [
~1:[IPoIB]:__port_join_bcast(): Unrealizable join due to rate mismatch.
~1:[IPoIB]:ipoib_set_inactive() [
~1:[IPoIB]:ipoib_resume_oids() [
~1:[IPoIB]:ipoib_resume_oids() ]
~1:[IPoIB]:ipoib_set_inactive() ]
~1:[IPoIB]:__bcast_get_cb() ]
~0:[IPoIB]:ipoib_check_for_hang() [
~0:[IPoIB]:ipoib_check_for_hang() ]
~0:[IPoIB]:ipoib_check_for_hang() [
~0:[IPoIB]:ipoib_check_for_hang() ]
~0:[IPoIB]:ipoib_check_for_hang() [
~0:[IPoIB]:ipoib_check_for_hang() ]
~0:[IPoIB]:ipoib_check_for_hang() [
~0:[IPoIB]:ipoib_check_for_hang() ]
~0:[IPoIB]:ipoib_check_for_hang() [
~0:[IPoIB]:ipoib_check_for_hang() ]
~0:[IPoIB]:ipoib_check_for_hang() [
~0:[IPoIB]:ipoib_check_for_hang() ]
~0:[IPoIB]:ipoib_check_for_hang() [
~0:[IPoIB]:ipoib_check_for_hang() ]
~0:[IPoIB]:ipoib_check_for_hang() [
~0:[IPoIB]:ipoib_check_for_hang() ]
~0:[IPoIB]:ipoib_check_for_hang() [
~0:[IPoIB]:ipoib_check_for_hang() ]
~0:[IPoIB]:ipoib_check_for_hang() [
~0:[IPoIB]:ipoib_check_for_hang() ]
~0:[IPoIB]:ipoib_check_for_hang() [
~0:[IPoIB]:ipoib_check_for_hang() ]
~0:[IPoIB]:ipoib_check_for_hang() [
~0:[IPoIB]:ipoib_check_for_hang() ]
~0:[IPoIB]:ipoib_check_for_hang() [
~0:[IPoIB]:ipoib_check_for_hang() ]
~0:[IPoIB]:ipoib_check_for_hang() [
~0:[IPoIB]:ipoib_check_for_hang() ]
~0:[IPoIB]:__ipoib_pnp_cb() [
~0:[IPoIB]:__ipoib_pnp_cb(): p_pnp_rec->pnp_event = 0x40002
(IB_PNP_PORT_DOWN)
~0:[IPoIB]:__ipoib_pnp_cb() ]
~0:[IPoIB]:__ipoib_pnp_cb() [
~0:[IPoIB]:__ipoib_pnp_cb(): p_pnp_rec->pnp_event = 0x10002
(IB_PNP_PORT_ARMED)
~0:[IPoIB]:__ipoib_pnp_cb() ]
~0:[IPoIB]:__ipoib_pnp_cb() [
~0:[IPoIB]:__ipoib_pnp_cb(): p_pnp_rec->pnp_event = 0x20002
(IB_PNP_PORT_ACTIVE)
~0:[IPoIB]:ipoib_port_up() [
~0:[IPoIB]:ipoib_port_up() ]
~0:[IPoIB]:__ipoib_pnp_cb() ]
~1:[IPoIB]:__port_info_cb() [
~1:[IPoIB]:__endpt_mgr_add_local() [
~1:[IPoIB]:ipoib_endpt_create() [
~1:[IPoIB]:ipoib_endpt_create() ]
~1:[IPoIB]:__endpt_mgr_insert() [
*** Assertion failed: p_qitem == &p_endpt->mac_item
*** Source File: w:\work\latest\ulp\ipoib\kernel\ipoib_port.c, line
4394
Break repeatedly, break Once, Ignore, terminate Process, or terminate
Thread (boipt)? i
i
================================================================
As you can see from the ASSERT we insert the local port twice.
The reason is that in set_port_inactive we change the adapter state to
down (and resume oids) but we don't clear the endpnt.
The following patch add call to __reset_enpt any time we call
set_inactive.
The patch fix the problem.
( I thought to call ipoib_port_down from with in the function
set_inactive but we call it from DPC level and port_down is waiting on
the sa_event)
10x
Yossi
Singed-off-by: Yossi Leybovich (sleybo at mellanox.co.il)
Index: ipoib_port.c
===================================================================
--- ipoib_port.c (revision 1647)
+++ ipoib_port.c (working copy)
@@ -4777,6 +4777,7 @@
if( status != IB_CANCELED )
ipoib_set_inactive( p_port->p_adapter );
+ __endpt_mgr_reset_all(p_port);
KeSetEvent( &p_port->sa_event, EVENT_INCREMENT, FALSE );
}
@@ -4898,7 +4899,7 @@
{
if( status != IB_CANCELED )
ipoib_set_inactive( p_port->p_adapter );
-
+ __endpt_mgr_reset_all(p_port);
KeSetEvent( &p_port->sa_event, EVENT_INCREMENT, FALSE );
}
@@ -5165,6 +5166,7 @@
if( status != IB_SUCCESS )
{
ipoib_set_inactive( p_port->p_adapter );
+ __endpt_mgr_reset_all(p_port);
KeSetEvent( &p_port->sa_event, EVENT_INCREMENT, FALSE );
}
ipoib_port_deref( p_port, ref_bcast_req_failed );
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ipoib_1x_bug.patch
Type: application/octet-stream
Size: 1295 bytes
Desc: ipoib_1x_bug.patch
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20060904/c2dcf98a/attachment.obj>
More information about the ofw
mailing list