[ofw] Add Assert to debug BS while deleting WDF device

Uri Habusha urih at mellanox.co.il
Tue Nov 16 23:06:25 PST 2010


SB

-----Original Message-----
From: Hefty, Sean [mailto:sean.hefty at intel.com] 
Sent: Tuesday, November 16, 2010 10:37 PM
To: Uri Habusha; ofw at lists.openfabrics.org; Smith, Stan
Subject: RE: Add Assert to debug BS while deleting WDF device

> I got a BS while disabling the IPoIB driver, the reason for that is trying
> to delete WDF device that is invalid. I guess that the object already
> deleted but I'm not sure.

We need to understand what exactly is happening.  The winmad and winverbs drivers to not interact with ipoib, so I don't understand the relation.
[Uri Habusha] you correct by OpenSm use winmad and it also disabled

We need to be sure that any patch is fixing the underlying issue.  What WdfOjbectDelete() call caused the crash?  Is it coming from one of the error handling sections(e.g. WmcreateControlDevice)?
[Uri Habusha] the BS came from calling WmPowerD0Exit. 

Setting ControlDevice = NULL without any other checks should just result in calling the function with a NULL pointer, so I don't understand the fix here.
[Uri Habusha] It's not a fix but adding some assert that will help us to catch the problem. I saw that you don't use ASSERT in your code at all. I think that ASSERT can help us to catch and understand a problem before the system enter to BS. What I ask to do is to ensure that after freeing the device the handle will be NULL and add some ASSERT. If the ASSERT will happen I'll know for sure that the object already freed and that it's not memory leak or something like that

For both drivers, the number of devices on the DevList indicates whether we need to create or destroy the control device.

> Is WmPowerD0Exit  and WmPowerD0Entry functions are thread safe? In other
> words is only one instance of the function can be run in same time?

I assume they would have to be -- for a single device.  Otherwise, I don't know what a driver could do with them.  It wouldn't know what power state the device was left in.

If there are multiple devices in the system, I don't know that the power state callbacks are synchronized between them.  This is the reason for the synchronization that you see in the code.  If PowerD0Entry/PowerD0Exit are serialized, then the code can be simplified.  We wouldn't need the create/destroy variables at all in those cases.

> In order to catch the problem I added some ASSERT in the code and set the
> pointer to NULL after deleting  the object.

Clearing the pointer is fine if needed, but IMO the asserts just add noise to the code.  The code isn't doing anything complex enough to justify asserts everywhere.
[Uri Habusha] I don't agree about it. As I wrote before ASSERT is a debug mechanism that checks that assumptions you made during coding are correct

> Index: winmad/kernel/wm_driver.c
> 
> ===================================================================
> 
> --- winmad/kernel/wm_driver.c               (revision 6842)
> 
> +++ winmad/kernel/wm_driver.c            (working copy)
> 
> @@ -41,7 +41,7 @@
> 
>  WDF_DECLARE_CONTEXT_TYPE_WITH_NAME(WM_IB_DEVICE, WmIbDeviceGetContext)
> 
>  WDF_DECLARE_CONTEXT_TYPE_WITH_NAME(WM_PROVIDER, WmProviderGetContext)
> 
> 
> 
> -WDFDEVICE
> ControlDevice;
> 
> +WDFDEVICE
> ControlDevice = NULL;

global variables are initialized to NULL by default.

>  err2:
> 
>                 WdfObjectDelete(ControlDevice);
> 
> +             ControlDevice = NULL;
> 
>                 return;
> 
>  err1:
> 
>                 WdfDeviceInitFree(pinit);
> 
> @@ -340,14 +344,12 @@
> 
>                 WM_REGISTRATION                      *reg;
> 
>                 LIST_ENTRY                                        *entry;
> 
>                 BOOLEAN
> destroy;
> 
> -              WDFDEVICE                                        ctrldev;
> 
> 
> 
>                 pdev = WmIbDeviceGetContext(Device);
> 
> 
> 
>                 KeAcquireGuardedMutex(&Lock);
> 
>                 RemoveEntryList(&pdev->Entry);
> 
>                 destroy = IsListEmpty(&DevList);
> 
> -              ctrldev = ControlDevice;
> 
> 
> 
>                 for (entry = ProvList.Flink; entry != &ProvList; entry =
> entry->Flink) {
> 
>                                 prov = CONTAINING_RECORD(entry,
> WM_PROVIDER, Entry);
> 
> @@ -365,7 +367,8 @@
> 
>                 }
> 
> 
> 
>                 if (destroy) {
> 
> -                              WdfObjectDelete(ctrldev);
> 
> +                             ASSERT(ControlDevice != NULL);
> 
> +                             WdfObjectDelete(ControlDevice);

This accesses ControlDevice outside of the lock.  This assumes that no other thread will call into the PowerD0* handlers.  If this assumption is correct, we can simplify the code even more.

- Sean



More information about the ofw mailing list