[ofw] [patch] [ibbus] Hibernation bugfix 1/2

Leonid Keller leonid at mellanox.co.il
Sun Nov 2 03:38:13 PST 2008


Applied in 1721,1722.
Thank you, Alex.


________________________________

	From: ofw-bounces at lists.openfabrics.org
[mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of Leonid Keller
	Sent: Wednesday, October 29, 2008 2:57 PM
	To: Alex Naslednikov; ofw at lists.openfabrics.org; Smith, Stan
	Subject: RE: [ofw] [patch] [ibbus] Hibernation bugfix 1/2
	
	
	> + //Setting h_ca to be NULL forces IPoIB to start only after
h_ca will be recreated
	 
	I'd suggest a more detailed comment
	//  Setting h_ca to be NULL forces IPoIB to start only after
re-acquiring new CA object
	// The latter happens in __port_was_hibernated or
port_mgr_port_add functions 
	// after arriving IB_PNP_PORT_ADD event from IBAL


________________________________

		From: ofw-bounces at lists.openfabrics.org
[mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of Alex Naslednikov
		Sent: Wednesday, October 29, 2008 11:14 AM
		To: ofw at lists.openfabrics.org; Smith, Stan
		Subject: [ofw] [patch] [ibbus] Hibernation bugfix 1/2
		
		
		Explanations:
		Hibernation mechanism was broken after filter driver
check-ins. 
		The current situation is that when restoring from
hibernation state, IPoIB adapters can't be recreated.
		That is because the following race:
		1. IPoIB sends the following IRP during the
initialization : IRP_MN_QUERY_INTERFACE
		2. This request is handled in port_query_interface that
calls cl_fwd_query_ifc with the following argument:
		p_ext->h_ca->obj.p_ci_ca->verbs.p_hca_dev
		We found that p_hca_dev struct is almost always NULL or
contains invalid pointer after restoring the system from the hibernation
		And that is because h_ca object still was uninitialized
prior to this Query Interface request?
		Why?
		3. We have a following check to avoid this race (i.e.
avoid h_ca to be created before IPoIB will query the interface):
		In _HibernateUpWorkItem(
		 IN    DEVICE_OBJECT*    p_dev_obj,
		 IN    void*      context )
		{
		 .....
		 while (!p_ext->h_ca) {
		  BUS_TRACE( BUS_DBG_PNP, ("Waiting for the end of HCA
registration ... \n"));
		  cl_thread_suspend( 200 ); /* suspend for 200 ms */
		 }
		 
		4. According to the above code, somebody should clear
h_ca field after dereferencing it.
		This code was exist and occasionally removed thereafter
		 
		5. Below is the patch:
		Index: core/bus/kernel/bus_port_mgr.c
	
===================================================================
		Fixing a bug in hibernation mechanism
		Signed-off by: leonid at mellanox.com
		                     xalex at mellanox.com
		--- core/bus/kernel/bus_port_mgr.c (revision 3373)
		+++ core/bus/kernel/bus_port_mgr.c (working copy)
		@@ -1258,6 +1258,8 @@
		 
		 hca_deref:
		  deref_al_obj( &p_ext->h_ca->obj );
		+ //Setting h_ca to be NULL forces IPoIB to start only
after h_ca will be recreated
		+ p_ext->h_ca = NULL;
		  cl_mutex_release( &gp_port_mgr->pdo_mutex );
		 
		  BUS_EXIT( BUS_DBG_PNP );
		@@ -1849,8 +1851,7 @@
		 
		  p_ext = p_dev_obj->DeviceExtension;
		  if  (p_ext->pdo.b_hibernating) {
		-  // Can't continue within hibernation stage
		-  return STATUS_UNSUCCESSFUL;
		+  BUS_TRACE( BUS_DBG_PNP, ("Restoring from the
hibernation: Device query received.\n") );
		  }
		  BUS_TRACE( BUS_DBG_PNP, ("Query i/f for %s: PDO %p
(=%p),ext %p, present %d, missing %d .\n",
		   p_ext->pdo.cl_ext.vfptr_pnp_po->identity,
p_ext->pdo.cl_ext.p_self_do, 
		
		 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20081102/549473b5/attachment.html>


More information about the ofw mailing list