[ofw] Opensm & WinMad: a race, causing BSOD722

Smith, Stan stan.smith at intel.com
Mon Jan 16 16:36:39 PST 2012


What are the conditions required to generate the BSOD?
>From the terse description it appears you are shutting down OpenSM. How is this accomplished?

Opensm - 10D, {b, 76157f00, 0, 8811d008}.           Not clear what this statement implies? Please explain.

Stan.


From: Leonid Keller [mailto:leonid at mellanox.com]
Sent: Sunday, January 15, 2012 3:09 AM
To: Smith, Stan; Hefty, Sean
Cc: Uri Habusha; Irena Gannon; ofw_list
Subject: Opensm & WinMad: a race, cauing BSOD722

Hi guys,

We got a BSOD in Opensm - 10D, {b, 76157f00, 0, 8811d008}.
Could you take a look ?

My analysis:

Seems like BSOD has been caused by a  race between the main and MAD reading threads of Opensm.

The main thread has already closed the port and is now found in osm_subn_destroy():

opensm_main
                ...
                osm_mad_pool_destroy(&p_osm->mad_pool);
                osm_vendor_delete(&p_osm->p_vendor);                        // port release
                osm_subn_destroy(&p_osm->subn);                                    // the thread is found here now

The reading thread is still in action:

opensm!umad_receiver
                libibumad!umad_recv
                ...
                winmad!WmIoRead
                                winmad!WmProviderRead
WdfObjectAcquireLock(pProvider->ReadQueue);            // BSOD

A try to ReadQueue with !wdfqueue fails.
Seems like pProvider is already released. But there is no any checks of its validity in WmProviderRead().

Possible solution:
Maybe WmIoRead() should check, that the Provider is not being released and take some reference, while WmProviderRemoveHandler() should wait to this reference to be removed ?


Technical info:

// Bugcheck
WDF_VIOLATION (10d)
The Kernel-Mode Driver Framework has detected that Windows detected an error
in a framework-based driver. In general, the dump file will yield additional
information about the driver that caused this bug check.
Arguments:
Arg1: 0000000b, An attempt to acquire or release a lock was invalid. In this
                case, Parameter 3 further specifies the error that has been
                made.
Arg2: 76157f00, The handle value.
Arg3: 00000000, A handle passed to either WdfObjectAcquireLock or
                WdfObjectReleaseLock represents an object that does not
                support synchronization locks.
Arg4: 8811d008, Reserved.

// crash line
void WmProviderRead(WM_PROVIDER *pProvider, WDFREQUEST Request)
{
                WM_REGISTRATION                      *reg;
                NTSTATUS                                           status;
                WM_IO_MAD                                   *wmad;
                size_t                                                    outlen, len = 0;

                status = WdfRequestRetrieveOutputBuffer(Request, sizeof(WM_IO_MAD), &wmad, &outlen);
                if (!NT_SUCCESS(status)) {
                                goto out;
                }

                WdfObjectAcquireLock(pProvider->ReadQueue);
                if (pProvider->MadHead == NULL) {

// Local parameters
3: kd> ?? pProvider
struct _WM_PROVIDER * 0x894761c0
   +0x000 Entry            : _LIST_ENTRY [ 0x8e3ea070 - 0x8e3ea070 ]
   +0x008 RegIndex         : _INDEX_LIST
   +0x014 ReadQueue        : 0x76157f00 WDFQUEUE__
   +0x018 MadHead          : (null)
   +0x01c MadTail          : 0xbc67ccc8 _ib_mad_element
   +0x020 Lock             : _KGUARDED_MUTEX
   +0x040 Ref              : 0
   +0x044 Event            : _KEVENT
   +0x054 Pending          : 0
   +0x058 Active           : 0
   +0x05c SharedEvent      : _KEVENT
   +0x06c Exclusive        : 0
   +0x070 ExclusiveEvent   : _KEVENT

// ReadQueue
3: kd> !wdfqueue 0x76157f00

Dumping WDFQUEUE 0x76157f00
=========================
Manualcouldn't read 00000044
, Deleted, Disposing, Not power-managed, PowerOn, Cannot accept, Can dispatch, Dispatching, ExecutionLevelDispatch, SynchronizationScopeNone
    Number of driver owned requests: 0
    Number of waiting requests: 0


// opensm

3: kd> !PROCESS a14ba6c0
PROCESS a14ba6c0  SessionId: 1  Cid: 13fc    Peb: 7ffd9000  ParentCid: 10fc
    DirBase: bd5feee0  ObjectTable: bf7fa7b8  HandleCount:  41.
    Image: opensm.exe
    VadRoot 89c8ca38 Vads 51 Clone 0 Private 271. Modified 3. Locked 0.
    DeviceMap 81f41008
    Token                             a6e81030
    ElapsedTime                       00:00:00.890
    UserTime                          00:00:00.015
    KernelTime                        00:00:00.062
    QuotaPoolUsage[PagedPool]         0
    QuotaPoolUsage[NonPagedPool]      0
    Working Set Sizes (now,min,max)  (913, 50, 345) (3652KB, 200KB, 1380KB)
    PeakWorkingSetSize                930
    VirtualSize                       76 Mb
    PeakVirtualSize                   78 Mb
    PageFaultCount                    1047
    MemoryPriority                    BACKGROUND
    BasePriority                      8
    CommitCharge                      8768
    Job                               89e04368

        THREAD 89e05d48  Cid 13fc.1758  Teb: 7ffdf000 Win32Thread: fe333b18 RUNNING on processor 2
        Not impersonating
        DeviceMap                 81f41008
        Owning Process            0       Image:         <Unknown>
        Attached Process          a14ba6c0       Image:         opensm.exe
        Wait Start TickCount      5900630        Ticks: 1 (0:00:00:00.015)
        Context Switch Count      454
        UserTime                  00:00:00.015
        KernelTime                00:00:00.578
        Win32 Start Address opensm!mainCRTStartup (0x0042aaf5)
        Stack Init adf03fd0 Current adf038d0 Base adf04000 Limit adf01000 Call 0
        Priority 11 BasePriority 8 PriorityDecrement 0 IoPriority 2 PagePriority 5
        ChildEBP RetAddr
        adf03cec 826be3a4 nt!KdPollBreakIn+0xea
        adf03cf0 826be381 nt!KdCheckForDebugBreak+0x17 (FPO: [0,0,0])
        adf03d24 8261f430 nt!KeUpdateRunTime+0x164
        adf03d24 003fd34a hal!HalpClockInterruptPn+0x158 (FPO: [0,2] TrapFrame @ adf03d34)
        000b2918 003d6174 opensm!osm_subn_destroy+0x1da (FPO: [Non-Fpo]) (CONV: stdcall) [a:\builds\9565\branches\mlnx_winof-3_0_0\ulp\opensm\user\opensm\osm_subnet.c @ 501]
        000b2924 003ba03e opensm!osm_opensm_destroy+0x104 (FPO: [Non-Fpo]) (CONV: stdcall) [a:\builds\9565\branches\mlnx_winof-3_0_0\ulp\opensm\user\opensm\osm_opensm.c @ 313]
        000cfda8 003bab21 opensm!opensm_main+0x1c3e (FPO: [Non-Fpo]) (CONV: stdcall) [a:\builds\9565\branches\mlnx_winof-3_0_0\ulp\opensm\user\opensm\main.c @ 1264]
        000cfdf4 0042a9c4 opensm!main+0x191 (FPO: [Non-Fpo]) (CONV: cdecl) [a:\builds\9565\branches\mlnx_winof-3_0_0\ulp\opensm\user\opensm\main.c @ 1305]
        000cfe38 771c1194 opensm!__mainCRTStartup+0x102 (FPO: [Non-Fpo]) (CONV: cdecl) [d:\5359\minkernel\crts\crtw32\dllstuff\crtexe.c @ 695]
        000cfe44 7753b3f5 kernel32!BaseThreadInitThunk+0xe (FPO: [Non-Fpo])
        000cfe84 7753b3c8 ntdll!__RtlUserThreadStart+0x70 (FPO: [Non-Fpo])
        000cfe9c 00000000 ntdll!_RtlUserThreadStart+0x1b (FPO: [Non-Fpo])

        THREAD c8974030  Cid 13fc.16f4  Teb: 7ffd6000 Win32Thread: 00000000 RUNNING on processor 3
        IRP List:
            89e505b0: (0006,0094) Flags: 00060970  Mdl: 00000000
        Not impersonating
        DeviceMap                 81f41008
        Owning Process            0       Image:         <Unknown>
        Attached Process          a14ba6c0       Image:         opensm.exe
        Wait Start TickCount      5900629        Ticks: 2 (0:00:00:00.031)
        Context Switch Count      24
        UserTime                  00:00:00.000
        KernelTime                00:00:00.015
        Win32 Start Address complib!cl_thread_callback (0x72f409f0)
        Stack Init adf57fd0 Current adf57b44 Base adf58000 Limit adf55000 Call 0
        Priority 10 BasePriority 8 PriorityDecrement 2 IoPriority 2 PagePriority 5
        ChildEBP RetAddr
        adf576d4 8272fe71 nt!RtlpBreakWithStatusInstruction (FPO: [1,0,0])
        adf57724 8273096d nt!KiBugCheckDebugBreak+0x1c
        adf57ae8 8272fd10 nt!KeBugCheck2+0x68b
        adf57b08 8d956f9e nt!KeBugCheckEx+0x1e
        adf57b24 8d955829 Wdf01000!FxVerifierBugCheck+0x24 (FPO: [Non-Fpo])
        adf57b4c 8e3e6626 Wdf01000!imp_WdfObjectAcquireLock+0x26 (FPO: [Non-Fpo])
        adf57b5c 8e3e6a3c winmad!WdfObjectAcquireLock+0x16 (FPO: [Non-Fpo]) (CONV: stdcall) [c:\winddk\7600.16385.1\inc\wdf\kmdf\1.9\wdfsync.h @ 61]
        adf57b7c 8e3e5cdd winmad!WmProviderRead+0x3c (FPO: [Non-Fpo]) (CONV: stdcall) [a:\builds\9565\branches\mlnx_winof-3_0_0\core\winmad\kernel\wm_provider.c @ 278]
        adf57b94 8d94502a winmad!WmIoRead+0x2d (FPO: [Non-Fpo]) (CONV: stdcall) [a:\builds\9565\branches\mlnx_winof-3_0_0\core\winmad\kernel\wm_driver.c @ 128]
        adf57bb0 8d946256 Wdf01000!FxIoQueueIoRead::Invoke+0x2a (FPO: [Non-Fpo])
        adf57bd8 8d9489ac Wdf01000!FxIoQueue::DispatchRequestToDriver+0x1a3 (FPO: [Non-Fpo])
        adf57bf4 8d949a36 Wdf01000!FxIoQueue::DispatchEvents+0x3be (FPO: [Non-Fpo])
        adf57c14 8d94b824 Wdf01000!FxIoQueue::QueueRequest+0x1ec (FPO: [Non-Fpo])
        adf57c38 8d93aa3f Wdf01000!FxPkgIo::Dispatch+0x27d (FPO: [Non-Fpo])
        adf57c44 8268f4bc Wdf01000!FxDevice::Dispatch+0x7f (FPO: [Non-Fpo])
        adf57c5c 82890f6e nt!IofCallDriver+0x63
        adf57c7c 828a2f32 nt!IopSynchronousServiceTail+0x1f8
        adf57d08 8269644a nt!NtReadFile+0x644
        adf57d08 775264f4 nt!KiFastCallEntry+0x12a (FPO: [0,3] TrapFrame @ adf57d34)
        007cfb84 7752570c ntdll!KiFastSystemCallRet (FPO: [0,0,0])
        007cfb88 757af249 ntdll!ZwReadFile+0xc (FPO: [9,0,0])
        007cfbec 771bdafd KERNELBASE!ReadFile+0xaa (FPO: [Non-Fpo])
        007cfc34 72fd1a25 kernel32!ReadFileImplementation+0xf0 (FPO: [Non-Fpo])
        007cfc58 74272910 winmad_72fd0000!CWMProvider::Receive+0xa5 (FPO: [Non-Fpo]) (CONV: stdcall) [a:\builds\9565\branches\mlnx_winof-3_0_0\core\winmad\user\wm_provider.cpp @ 227]
        007cfc7c 0042d76a libibumad!umad_recv+0x50 (FPO: [Non-Fpo]) (CONV: stdcall) [a:\builds\9565\branches\mlnx_winof-3_0_0\ulp\libibumad\src\umad.cpp @ 706]
        007cfd0c 72f40a04 opensm!umad_receiver+0xba (FPO: [Non-Fpo]) (CONV: stdcall) [a:\builds\9565\branches\mlnx_winof-3_0_0\ulp\opensm\user\libvendor\osm_vendor_ibumad.c @ 266]
        007cfd18 771c1194 complib!cl_thread_callback+0x14 (FPO: [Non-Fpo]) (CONV: stdcall) [a:\builds\9565\branches\mlnx_winof-3_0_0\core\complib\user\cl_thread.c @ 49]
        007cfd24 7753b3f5 kernel32!BaseThreadInitThunk+0xe (FPO: [Non-Fpo])
        007cfd64 7753b3c8 ntdll!__RtlUserThreadStart+0x70 (FPO: [Non-Fpo])
        007cfd7c 00000000 ntdll!_RtlUserThreadStart+0x1b (FPO: [Non-Fpo])
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20120117/948cb823/attachment.html>


More information about the ofw mailing list