[ofw] [patch] [ibbus] Hibernation bugfix 1/2

Leonid Keller leonid at mellanox.co.il
Wed Oct 29 05:56:30 PDT 2008


> + //Setting h_ca to be NULL forces IPoIB to start only after h_ca will
be recreated
 
I'd suggest a more detailed comment
//  Setting h_ca to be NULL forces IPoIB to start only after
re-acquiring new CA object
// The latter happens in __port_was_hibernated or port_mgr_port_add
functions 
// after arriving IB_PNP_PORT_ADD event from IBAL


________________________________

	From: ofw-bounces at lists.openfabrics.org
[mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of Alex Naslednikov
	Sent: Wednesday, October 29, 2008 11:14 AM
	To: ofw at lists.openfabrics.org; Smith, Stan
	Subject: [ofw] [patch] [ibbus] Hibernation bugfix 1/2
	
	
	Explanations:
	Hibernation mechanism was broken after filter driver check-ins. 
	The current situation is that when restoring from hibernation
state, IPoIB adapters can't be recreated.
	That is because the following race:
	1. IPoIB sends the following IRP during the initialization :
IRP_MN_QUERY_INTERFACE
	2. This request is handled in port_query_interface that calls
cl_fwd_query_ifc with the following argument:
	p_ext->h_ca->obj.p_ci_ca->verbs.p_hca_dev
	We found that p_hca_dev struct is almost always NULL or contains
invalid pointer after restoring the system from the hibernation
	And that is because h_ca object still was uninitialized prior to
this Query Interface request?
	Why?
	3. We have a following check to avoid this race (i.e. avoid h_ca
to be created before IPoIB will query the interface):
	In _HibernateUpWorkItem(
	 IN    DEVICE_OBJECT*    p_dev_obj,
	 IN    void*      context )
	{
	 .....
	 while (!p_ext->h_ca) {
	  BUS_TRACE( BUS_DBG_PNP, ("Waiting for the end of HCA
registration ... \n"));
	  cl_thread_suspend( 200 ); /* suspend for 200 ms */
	 }
	 
	4. According to the above code, somebody should clear h_ca field
after dereferencing it.
	This code was exist and occasionally removed thereafter
	 
	5. Below is the patch:
	Index: core/bus/kernel/bus_port_mgr.c
	
===================================================================
	Fixing a bug in hibernation mechanism
	Signed-off by: leonid at mellanox.com
	                     xalex at mellanox.com
	--- core/bus/kernel/bus_port_mgr.c (revision 3373)
	+++ core/bus/kernel/bus_port_mgr.c (working copy)
	@@ -1258,6 +1258,8 @@
	 
	 hca_deref:
	  deref_al_obj( &p_ext->h_ca->obj );
	+ //Setting h_ca to be NULL forces IPoIB to start only after
h_ca will be recreated
	+ p_ext->h_ca = NULL;
	  cl_mutex_release( &gp_port_mgr->pdo_mutex );
	 
	  BUS_EXIT( BUS_DBG_PNP );
	@@ -1849,8 +1851,7 @@
	 
	  p_ext = p_dev_obj->DeviceExtension;
	  if  (p_ext->pdo.b_hibernating) {
	-  // Can't continue within hibernation stage
	-  return STATUS_UNSUCCESSFUL;
	+  BUS_TRACE( BUS_DBG_PNP, ("Restoring from the hibernation:
Device query received.\n") );
	  }
	  BUS_TRACE( BUS_DBG_PNP, ("Query i/f for %s: PDO %p (=%p),ext
%p, present %d, missing %d .\n",
	   p_ext->pdo.cl_ext.vfptr_pnp_po->identity,
p_ext->pdo.cl_ext.p_self_do, 
	
	 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20081029/52c1932e/attachment.html>


More information about the ofw mailing list