[ofw] [patch] [ibbus] Hibernation bugfix 1/2

Smith, Stan stan.smith at intel.com
Wed Oct 29 09:51:45 PDT 2008


Hello,
  Some history - The reason I removed 'p_ext->h_ca = NULL' was due to the BSOD caused later in port removal by a dereference of 'p_ext->h_ca'. Now that code has changed with protections against deref of h_ca, adding the 'p_ext->h_ca = NULL' and waiting makes reasonable sense.

stan.

________________________________
From: Alex Naslednikov [mailto:xalex at mellanox.co.il]
Sent: Wednesday, October 29, 2008 2:14 AM
To: ofw at lists.openfabrics.org; Smith, Stan
Subject: [ofw] [patch] [ibbus] Hibernation bugfix 1/2

Explanations:
Hibernation mechanism was broken after filter driver check-ins.
The current situation is that when restoring from hibernation state, IPoIB adapters can't be recreated.
That is because the following race:
1. IPoIB sends the following IRP during the initialization : IRP_MN_QUERY_INTERFACE
2. This request is handled in port_query_interface that calls cl_fwd_query_ifc with the following argument:
p_ext->h_ca->obj.p_ci_ca->verbs.p_hca_dev
We found that p_hca_dev struct is almost always NULL or contains invalid pointer after restoring the system from the hibernation
And that is because h_ca object still was uninitialized prior to this Query Interface request?
Why?
3. We have a following check to avoid this race (i.e. avoid h_ca to be created before IPoIB will query the interface):
In _HibernateUpWorkItem(
 IN    DEVICE_OBJECT*    p_dev_obj,
 IN    void*      context )
{
 .....
 while (!p_ext->h_ca) {
  BUS_TRACE( BUS_DBG_PNP, ("Waiting for the end of HCA registration ... \n"));
  cl_thread_suspend( 200 ); /* suspend for 200 ms */
 }

4. According to the above code, somebody should clear h_ca field after dereferencing it.
This code was exist and occasionally removed thereafter

5. Below is the patch:
Index: core/bus/kernel/bus_port_mgr.c
===================================================================
Fixing a bug in hibernation mechanism
Signed-off by: leonid at mellanox.com
                     xalex at mellanox.com
--- core/bus/kernel/bus_port_mgr.c (revision 3373)
+++ core/bus/kernel/bus_port_mgr.c (working copy)
@@ -1258,6 +1258,8 @@

 hca_deref:
  deref_al_obj( &p_ext->h_ca->obj );
+ //Setting h_ca to be NULL forces IPoIB to start only after h_ca will be recreated
+ p_ext->h_ca = NULL;
  cl_mutex_release( &gp_port_mgr->pdo_mutex );

  BUS_EXIT( BUS_DBG_PNP );
@@ -1849,8 +1851,7 @@

  p_ext = p_dev_obj->DeviceExtension;
  if  (p_ext->pdo.b_hibernating) {
-  // Can't continue within hibernation stage
-  return STATUS_UNSUCCESSFUL;
+  BUS_TRACE( BUS_DBG_PNP, ("Restoring from the hibernation: Device query received.\n") );
  }
  BUS_TRACE( BUS_DBG_PNP, ("Query i/f for %s: PDO %p (=%p),ext %p, present %d, missing %d .\n",
   p_ext->pdo.cl_ext.vfptr_pnp_po->identity, p_ext->pdo.cl_ext.p_self_do,

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20081029/a819a6d0/attachment.html>


More information about the ofw mailing list