[ofw] [patch] [ibbus] Hibernation bugfix 1/2
Smith, Stan
stan.smith at intel.com
Wed Oct 29 09:51:45 PDT 2008
Hello,
Some history - The reason I removed 'p_ext->h_ca = NULL' was due to the BSOD caused later in port removal by a dereference of 'p_ext->h_ca'. Now that code has changed with protections against deref of h_ca, adding the 'p_ext->h_ca = NULL' and waiting makes reasonable sense.
stan.
________________________________
From: Alex Naslednikov [mailto:xalex at mellanox.co.il]
Sent: Wednesday, October 29, 2008 2:14 AM
To: ofw at lists.openfabrics.org; Smith, Stan
Subject: [ofw] [patch] [ibbus] Hibernation bugfix 1/2
Explanations:
Hibernation mechanism was broken after filter driver check-ins.
The current situation is that when restoring from hibernation state, IPoIB adapters can't be recreated.
That is because the following race:
1. IPoIB sends the following IRP during the initialization : IRP_MN_QUERY_INTERFACE
2. This request is handled in port_query_interface that calls cl_fwd_query_ifc with the following argument:
p_ext->h_ca->obj.p_ci_ca->verbs.p_hca_dev
We found that p_hca_dev struct is almost always NULL or contains invalid pointer after restoring the system from the hibernation
And that is because h_ca object still was uninitialized prior to this Query Interface request?
Why?
3. We have a following check to avoid this race (i.e. avoid h_ca to be created before IPoIB will query the interface):
In _HibernateUpWorkItem(
IN DEVICE_OBJECT* p_dev_obj,
IN void* context )
{
.....
while (!p_ext->h_ca) {
BUS_TRACE( BUS_DBG_PNP, ("Waiting for the end of HCA registration ... \n"));
cl_thread_suspend( 200 ); /* suspend for 200 ms */
}
4. According to the above code, somebody should clear h_ca field after dereferencing it.
This code was exist and occasionally removed thereafter
5. Below is the patch:
Index: core/bus/kernel/bus_port_mgr.c
===================================================================
Fixing a bug in hibernation mechanism
Signed-off by: leonid at mellanox.com
xalex at mellanox.com
--- core/bus/kernel/bus_port_mgr.c (revision 3373)
+++ core/bus/kernel/bus_port_mgr.c (working copy)
@@ -1258,6 +1258,8 @@
hca_deref:
deref_al_obj( &p_ext->h_ca->obj );
+ //Setting h_ca to be NULL forces IPoIB to start only after h_ca will be recreated
+ p_ext->h_ca = NULL;
cl_mutex_release( &gp_port_mgr->pdo_mutex );
BUS_EXIT( BUS_DBG_PNP );
@@ -1849,8 +1851,7 @@
p_ext = p_dev_obj->DeviceExtension;
if (p_ext->pdo.b_hibernating) {
- // Can't continue within hibernation stage
- return STATUS_UNSUCCESSFUL;
+ BUS_TRACE( BUS_DBG_PNP, ("Restoring from the hibernation: Device query received.\n") );
}
BUS_TRACE( BUS_DBG_PNP, ("Query i/f for %s: PDO %p (=%p),ext %p, present %d, missing %d .\n",
p_ext->pdo.cl_ext.vfptr_pnp_po->identity, p_ext->pdo.cl_ext.p_self_do,
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20081029/a819a6d0/attachment.html>
More information about the ofw
mailing list