[ofw] [patch] [ibbus] Hibernation bugfix 1/2
Leonid Keller
leonid at mellanox.co.il
Sun Nov 2 03:38:13 PST 2008
Applied in 1721,1722.
Thank you, Alex.
________________________________
From: ofw-bounces at lists.openfabrics.org
[mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of Leonid Keller
Sent: Wednesday, October 29, 2008 2:57 PM
To: Alex Naslednikov; ofw at lists.openfabrics.org; Smith, Stan
Subject: RE: [ofw] [patch] [ibbus] Hibernation bugfix 1/2
> + //Setting h_ca to be NULL forces IPoIB to start only after
h_ca will be recreated
I'd suggest a more detailed comment
// Setting h_ca to be NULL forces IPoIB to start only after
re-acquiring new CA object
// The latter happens in __port_was_hibernated or
port_mgr_port_add functions
// after arriving IB_PNP_PORT_ADD event from IBAL
________________________________
From: ofw-bounces at lists.openfabrics.org
[mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of Alex Naslednikov
Sent: Wednesday, October 29, 2008 11:14 AM
To: ofw at lists.openfabrics.org; Smith, Stan
Subject: [ofw] [patch] [ibbus] Hibernation bugfix 1/2
Explanations:
Hibernation mechanism was broken after filter driver
check-ins.
The current situation is that when restoring from
hibernation state, IPoIB adapters can't be recreated.
That is because the following race:
1. IPoIB sends the following IRP during the
initialization : IRP_MN_QUERY_INTERFACE
2. This request is handled in port_query_interface that
calls cl_fwd_query_ifc with the following argument:
p_ext->h_ca->obj.p_ci_ca->verbs.p_hca_dev
We found that p_hca_dev struct is almost always NULL or
contains invalid pointer after restoring the system from the hibernation
And that is because h_ca object still was uninitialized
prior to this Query Interface request?
Why?
3. We have a following check to avoid this race (i.e.
avoid h_ca to be created before IPoIB will query the interface):
In _HibernateUpWorkItem(
IN DEVICE_OBJECT* p_dev_obj,
IN void* context )
{
.....
while (!p_ext->h_ca) {
BUS_TRACE( BUS_DBG_PNP, ("Waiting for the end of HCA
registration ... \n"));
cl_thread_suspend( 200 ); /* suspend for 200 ms */
}
4. According to the above code, somebody should clear
h_ca field after dereferencing it.
This code was exist and occasionally removed thereafter
5. Below is the patch:
Index: core/bus/kernel/bus_port_mgr.c
===================================================================
Fixing a bug in hibernation mechanism
Signed-off by: leonid at mellanox.com
xalex at mellanox.com
--- core/bus/kernel/bus_port_mgr.c (revision 3373)
+++ core/bus/kernel/bus_port_mgr.c (working copy)
@@ -1258,6 +1258,8 @@
hca_deref:
deref_al_obj( &p_ext->h_ca->obj );
+ //Setting h_ca to be NULL forces IPoIB to start only
after h_ca will be recreated
+ p_ext->h_ca = NULL;
cl_mutex_release( &gp_port_mgr->pdo_mutex );
BUS_EXIT( BUS_DBG_PNP );
@@ -1849,8 +1851,7 @@
p_ext = p_dev_obj->DeviceExtension;
if (p_ext->pdo.b_hibernating) {
- // Can't continue within hibernation stage
- return STATUS_UNSUCCESSFUL;
+ BUS_TRACE( BUS_DBG_PNP, ("Restoring from the
hibernation: Device query received.\n") );
}
BUS_TRACE( BUS_DBG_PNP, ("Query i/f for %s: PDO %p
(=%p),ext %p, present %d, missing %d .\n",
p_ext->pdo.cl_ext.vfptr_pnp_po->identity,
p_ext->pdo.cl_ext.p_self_do,
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20081102/549473b5/attachment.html>
More information about the ofw
mailing list