[ofw] Add Assert to debug BS while deleting WDF device

Uri Habusha urih at mellanox.co.il
Thu Dec 2 13:21:26 PST 2010


See my comments below

-----Original Message-----
From: Hefty, Sean [mailto:sean.hefty at intel.com] 
Sent: Thursday, December 02, 2010 6:22 PM
To: Uri Habusha; ofw at lists.openfabrics.org; Smith, Stan; Fab Tillier
Subject: RE: Add Assert to debug BS while deleting WDF device

copying your patch inline for comments.  See '>>' below:

Index: winmad/kernel/wm_driver.c
===================================================================
--- winmad/kernel/wm_driver.c	(revision 3009)
+++ winmad/kernel/wm_driver.c	(working copy)
@@ -41,7 +41,7 @@
 WDF_DECLARE_CONTEXT_TYPE_WITH_NAME(WM_IB_DEVICE, WmIbDeviceGetContext)
 WDF_DECLARE_CONTEXT_TYPE_WITH_NAME(WM_PROVIDER, WmProviderGetContext)
 
-WDFDEVICE				ControlDevice;
+WDFDEVICE				ControlDevice = NULL;


>> Globals are initialized to 0 by default.  This assignment isn't needed.
[Uri Habusha] I think it's a good practice to initialize variable also global. It also much better documented the code.



 static LIST_ENTRY		DevList;
 static LIST_ENTRY		ProvList;
 static KGUARDED_MUTEX	Lock;
@@ -195,11 +195,13 @@
 	WdfDeviceInitSetFileObjectConfig(pinit, &fileconfig, &attr);
 
 	WDF_OBJECT_ATTRIBUTES_INIT(&attr);
+	ASSERT(ControlDevice == NULL);
 	status = WdfDeviceCreate(&pinit, &attr, &ControlDevice);
 	if (!NT_SUCCESS(status)) {
 		goto err1;
 	}
 
+	ASSERT(ControlDevice != NULL);
 	status = WdfDeviceCreateSymbolicLink(ControlDevice, &symlink);
 	if (!NT_SUCCESS(status)) {
 		goto err2;
@@ -220,6 +222,7 @@
 
 err2:
 	WdfObjectDelete(ControlDevice);
+	ControlDevice = NULL;
 	return;
 err1:
 	WdfDeviceInitFree(pinit);
@@ -299,7 +302,8 @@
 
 	status = WdfFdoQueryForInterface(Device, &GUID_RDMA_INTERFACE_VERBS,
 									 (PINTERFACE) &dev->VerbsInterface,
-									 sizeof(dev->VerbsInterface), VerbsVersion(2, 0),
+ 									 sizeof(dev->VerbsInterface), 
+ 									 VerbsVersion(VERBS_MAJOR_VER, VERBS_MINOR_VER), 
 									 NULL);
 	if (!NT_SUCCESS(status)) {
 		return status;
@@ -340,14 +344,12 @@
 	WM_REGISTRATION		*reg;
 	LIST_ENTRY			*entry;
 	BOOLEAN				destroy;
-	WDFDEVICE			ctrldev;
 
 	pdev = WmIbDeviceGetContext(Device);
 
 	KeAcquireGuardedMutex(&Lock);
 	RemoveEntryList(&pdev->Entry);
 	destroy = IsListEmpty(&DevList);
-	ctrldev = ControlDevice;
 
 	for (entry = ProvList.Flink; entry != &ProvList; entry = entry->Flink) {
 		prov = CONTAINING_RECORD(entry, WM_PROVIDER, Entry);
@@ -364,8 +366,9 @@
 		ExFreePoolWithTag(pdev->pPortArray, 'pimw');
 	}
 
-	if (destroy) {
-		WdfObjectDelete(ctrldev);
+	if (destroy && (ControlDevice != NULL)) {
+		WdfObjectDelete(ControlDevice);
+		ControlDevice = NULL;

>> This change breaks the locking that was there.  We set ctrldev under lock and use that rather than ControlDevice directly outside of the lock.  If there are multiple HCAs in a system, then we could be processing power enter on one, while getting power exit for the other.  The locking needs to account for this, and this is why the code is structured as it is.  Because of this, I don't know that the assertions that you added will be true.

[Uri Habusha] It will not resolve the problem. Let's assume 2 instances of the function is called. The first one is set the ctrldev under the lock than come second process and set the ctrldev. Now you try to delete the object twice. If you really think that multiple call of the function you should take the ctrldev and then set the ControlDevice to NULL under the lock.

>> If all device power callbacks are serialized, we can simplify the code, but I couldn't find any documentation that clarifies whether this is the case.  The same issue exists in winverbs with your patches.  Can you please try running only with the changes that I submitted?

- Sean
 



More information about the ofw mailing list