[Openib-windows] [PATCH] IPoIB response to 1x links

Yossi Leybovich sleybo at mellanox.co.il
Mon Sep 4 08:28:23 PDT 2006


Fab
 
One of our verification engineers generate the following scenario.
 
-phy link down 
-link speed set to 1x
-phy link up
 
-phy link down 
-link speed set to 4x
-phy link up
 
And he got blue screen in the IPOIB .
 
I debug it with check version and got the following assert ( with the
prints of PNP/INIT/ENDPT)
 
===================================
0:[IPoIB]:__ipoib_pnp_cb() [
~0:[IPoIB]:__ipoib_pnp_cb(): p_pnp_rec->pnp_event = 0x40002
(IB_PNP_PORT_DOWN)
~0:[IPoIB]:__ipoib_pnp_cb(): Link DOWN!
~0:[IPoIB]:ipoib_port_down() [
~0:[IPoIB]:__endpt_mgr_reset_all() [
~0:[IPoIB]:__endpt_destroying() [
~0:[IPoIB]:__endpt_destroying() ]
~0:[IPoIB]:__endpt_destroying() [
~1:[IPoIB]:__endpt_cleanup() [
~0:[IPoIB]:__endpt_destroying() ]
~1:[IPoIB]:__endpt_cleanup() ]
~1:[IPoIB]:__endpt_free() [
~0:[IPoIB]:__endpt_destroying() [
~0:[IPoIB]:__endpt_destroying() ]
~0:[IPoIB]:__endpt_cleanup() [
~0:[IPoIB]:__endpt_cleanup(): ~0:[IPoIB]:__endpt_destroying() [
~0:[IPoIB]:__endpt_destroying() ]
Leaving MCast group
~0:[IPoIB]:__endpt_cleanup() ]
~0:[IPoIB]:__endpt_destroying() [
~0:[IPoIB]:__endpt_destroying() ]
~0:[IPoIB]:__endpt_free() [
~0:[IPoIB]:__endpt_free() ]
~0:[IPoIB]:__endpt_destroying() [
~0:[IPoIB]:__endpt_destroying() ]
~0:[IPoIB]:__endpt_cleanup() [
~0:[IPoIB]:__endpt_cleanup(): ~0:[IPoIB]:__endpt_mgr_reset_all() ]
~0:[IPoIB]:ipoib_port_down() ]
Leaving MCast group
~0:[IPoIB]:__endpt_cleanup() ]
~0:[IPoIB]:__ipoib_pnp_cb() ]
~0:[IPoIB]:__endpt_free() [
~0:[IPoIB]:__endpt_free() ]
~0:[IPoIB]:__endpt_cleanup() [
~0:[IPoIB]:__endpt_cleanup(): Leaving MCast group
~0:[IPoIB]:__endpt_cleanup() ]
~0:[IPoIB]:__endpt_free() [
~0:[IPoIB]:__endpt_free() ]
~0:[IPoIB]:__endpt_cleanup() [
~0:[IPoIB]:__endpt_cleanup(): Leaving MCast group
~0:[IPoIB]:__endpt_cleanup() ]
~0:[IPoIB]:__endpt_free() [
~0:[IPoIB]:__endpt_free() ]
~0:[IPoIB]:__endpt_cleanup() [
~0:[IPoIB]:__endpt_cleanup(): Leaving MCast group
~0:[IPoIB]:__endpt_cleanup() ]
~0:sa_req_send_comp_cb() !ERROR!: request failed - notifying user
~0:sa_req_send_comp_cb() !ERROR!: request failed - notifying user
~0:[IPoIB]:__endpt_free() [
~0:[IPoIB]:__endpt_free() ]
~1:[IPoIB]:__endpt_free() ]
~0:[IPoIB]:__ipoib_pnp_cb() [
~0:[IPoIB]:__ipoib_pnp_cb(): p_pnp_rec->pnp_event = 0x8002
(IB_PNP_PORT_INIT)
~0:[IPoIB]:__ipoib_pnp_cb() ]
~0:[IPoIB]:__ipoib_pnp_cb() [
~0:[IPoIB]:__ipoib_pnp_cb(): p_pnp_rec->pnp_event = 0x100002
(IB_PNP_SM_CHANGE)
~0:[IPoIB]:__ipoib_pnp_cb() ]
~0:[IPoIB]:__ipoib_pnp_cb() [
~0:[IPoIB]:__ipoib_pnp_cb(): p_pnp_rec->pnp_event = 0x20002
(IB_PNP_PORT_ACTIVE)
~0:[IPoIB]:ipoib_port_up() [
~0:[IPoIB]:ipoib_port_up() ]
~0:[IPoIB]:__ipoib_pnp_cb() ]
~1:[IPoIB]:__port_info_cb() [
~1:[IPoIB]:__endpt_mgr_add_local() [
~1:[IPoIB]:ipoib_endpt_create() [
~1:[IPoIB]:ipoib_endpt_create() ]
~1:[IPoIB]:__endpt_mgr_insert() [
~1:[IPoIB]:__endpt_mgr_insert() ]
~1:[IPoIB]:__endpt_mgr_add_local() ]
~1:[IPoIB]:__port_info_cb(): Received port info: link width = 1.
~1:[IPoIB]:ipoib_set_rate() [
~1:[IPoIB]:ipoib_set_rate(): Link speed is 2.5Gs
~1:[IPoIB]:ipoib_set_rate(): Link width is 1X
~1:[IPoIB]:ipoib_set_rate() ]
~1:[IPoIB]:__port_get_bcast() [
~0:sa_req_send_comp_cb() !ERROR!: request failed - notifying user
~1:[IPoIB]:__port_get_bcast() ]
~1:[IPoIB]:__port_info_cb() ]
~1:[IPoIB]:__bcast_get_cb() [
~1:[IPoIB]:__port_join_bcast() [
~1:[IPoIB]:__port_join_bcast(): Unrealizable join due to rate mismatch.
~1:[IPoIB]:ipoib_set_inactive() [
~1:[IPoIB]:ipoib_resume_oids() [
~1:[IPoIB]:ipoib_resume_oids() ]
~1:[IPoIB]:ipoib_set_inactive() ]
~1:[IPoIB]:__bcast_get_cb() ]
~0:[IPoIB]:ipoib_check_for_hang() [
~0:[IPoIB]:ipoib_check_for_hang() ]
~0:[IPoIB]:ipoib_check_for_hang() [
~0:[IPoIB]:ipoib_check_for_hang() ]
~0:[IPoIB]:ipoib_check_for_hang() [
~0:[IPoIB]:ipoib_check_for_hang() ]
~0:[IPoIB]:ipoib_check_for_hang() [
~0:[IPoIB]:ipoib_check_for_hang() ]
~0:[IPoIB]:ipoib_check_for_hang() [
~0:[IPoIB]:ipoib_check_for_hang() ]
~0:[IPoIB]:ipoib_check_for_hang() [
~0:[IPoIB]:ipoib_check_for_hang() ]
~0:[IPoIB]:ipoib_check_for_hang() [
~0:[IPoIB]:ipoib_check_for_hang() ]
~0:[IPoIB]:ipoib_check_for_hang() [
~0:[IPoIB]:ipoib_check_for_hang() ]
~0:[IPoIB]:ipoib_check_for_hang() [
~0:[IPoIB]:ipoib_check_for_hang() ]
~0:[IPoIB]:ipoib_check_for_hang() [
~0:[IPoIB]:ipoib_check_for_hang() ]
~0:[IPoIB]:ipoib_check_for_hang() [
~0:[IPoIB]:ipoib_check_for_hang() ]
~0:[IPoIB]:ipoib_check_for_hang() [
~0:[IPoIB]:ipoib_check_for_hang() ]
~0:[IPoIB]:ipoib_check_for_hang() [
~0:[IPoIB]:ipoib_check_for_hang() ]
~0:[IPoIB]:ipoib_check_for_hang() [
~0:[IPoIB]:ipoib_check_for_hang() ]
~0:[IPoIB]:__ipoib_pnp_cb() [
~0:[IPoIB]:__ipoib_pnp_cb(): p_pnp_rec->pnp_event = 0x40002
(IB_PNP_PORT_DOWN)
~0:[IPoIB]:__ipoib_pnp_cb() ]
~0:[IPoIB]:__ipoib_pnp_cb() [
~0:[IPoIB]:__ipoib_pnp_cb(): p_pnp_rec->pnp_event = 0x10002
(IB_PNP_PORT_ARMED)
~0:[IPoIB]:__ipoib_pnp_cb() ]
~0:[IPoIB]:__ipoib_pnp_cb() [
~0:[IPoIB]:__ipoib_pnp_cb(): p_pnp_rec->pnp_event = 0x20002
(IB_PNP_PORT_ACTIVE)
~0:[IPoIB]:ipoib_port_up() [
~0:[IPoIB]:ipoib_port_up() ]
~0:[IPoIB]:__ipoib_pnp_cb() ]
~1:[IPoIB]:__port_info_cb() [
~1:[IPoIB]:__endpt_mgr_add_local() [
~1:[IPoIB]:ipoib_endpt_create() [
~1:[IPoIB]:ipoib_endpt_create() ]
~1:[IPoIB]:__endpt_mgr_insert() [
 
*** Assertion failed: p_qitem == &p_endpt->mac_item
***   Source File: w:\work\latest\ulp\ipoib\kernel\ipoib_port.c, line
4394
 
Break repeatedly, break Once, Ignore, terminate Process, or terminate
Thread (boipt)? i
i
 
================================================================


As you can see from the ASSERT we insert the local port twice.
The reason is that in set_port_inactive we change the adapter state to
down (and resume oids) but we don't clear the endpnt.
The following patch add call to __reset_enpt any time we call
set_inactive.
The patch fix the problem.
( I thought to call ipoib_port_down from with in the function
set_inactive but we call it from DPC level and port_down is waiting on
the sa_event)
 

10x
Yossi 


Singed-off-by: Yossi Leybovich (sleybo at mellanox.co.il)
 
Index: ipoib_port.c
===================================================================
--- ipoib_port.c (revision 1647)
+++ ipoib_port.c (working copy)
@@ -4777,6 +4777,7 @@
   if( status != IB_CANCELED )
    ipoib_set_inactive( p_port->p_adapter );
 
+  __endpt_mgr_reset_all(p_port);
   KeSetEvent( &p_port->sa_event, EVENT_INCREMENT, FALSE );
  }
 
@@ -4898,7 +4899,7 @@
  {
   if( status != IB_CANCELED )
    ipoib_set_inactive( p_port->p_adapter );
-
+  __endpt_mgr_reset_all(p_port);
   KeSetEvent( &p_port->sa_event, EVENT_INCREMENT, FALSE );
  }
 
@@ -5165,6 +5166,7 @@
   if( status != IB_SUCCESS )
   {
    ipoib_set_inactive( p_port->p_adapter );
+   __endpt_mgr_reset_all(p_port);
    KeSetEvent( &p_port->sa_event, EVENT_INCREMENT, FALSE );
   }
   ipoib_port_deref( p_port, ref_bcast_req_failed );

-------------- next part --------------
A non-text attachment was scrubbed...
Name: ipoib_1x_bug.patch
Type: application/octet-stream
Size: 1295 bytes
Desc: ipoib_1x_bug.patch
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20060904/c2dcf98a/attachment.obj>


More information about the ofw mailing list