[ofw] Opensm & WinMad: a race, causing BSOD722
Smith, Stan
stan.smith at intel.com
Mon Jan 16 16:36:39 PST 2012
What are the conditions required to generate the BSOD?
>From the terse description it appears you are shutting down OpenSM. How is this accomplished?
Opensm - 10D, {b, 76157f00, 0, 8811d008}. Not clear what this statement implies? Please explain.
Stan.
From: Leonid Keller [mailto:leonid at mellanox.com]
Sent: Sunday, January 15, 2012 3:09 AM
To: Smith, Stan; Hefty, Sean
Cc: Uri Habusha; Irena Gannon; ofw_list
Subject: Opensm & WinMad: a race, cauing BSOD722
Hi guys,
We got a BSOD in Opensm - 10D, {b, 76157f00, 0, 8811d008}.
Could you take a look ?
My analysis:
Seems like BSOD has been caused by a race between the main and MAD reading threads of Opensm.
The main thread has already closed the port and is now found in osm_subn_destroy():
opensm_main
...
osm_mad_pool_destroy(&p_osm->mad_pool);
osm_vendor_delete(&p_osm->p_vendor); // port release
osm_subn_destroy(&p_osm->subn); // the thread is found here now
The reading thread is still in action:
opensm!umad_receiver
libibumad!umad_recv
...
winmad!WmIoRead
winmad!WmProviderRead
WdfObjectAcquireLock(pProvider->ReadQueue); // BSOD
A try to ReadQueue with !wdfqueue fails.
Seems like pProvider is already released. But there is no any checks of its validity in WmProviderRead().
Possible solution:
Maybe WmIoRead() should check, that the Provider is not being released and take some reference, while WmProviderRemoveHandler() should wait to this reference to be removed ?
Technical info:
// Bugcheck
WDF_VIOLATION (10d)
The Kernel-Mode Driver Framework has detected that Windows detected an error
in a framework-based driver. In general, the dump file will yield additional
information about the driver that caused this bug check.
Arguments:
Arg1: 0000000b, An attempt to acquire or release a lock was invalid. In this
case, Parameter 3 further specifies the error that has been
made.
Arg2: 76157f00, The handle value.
Arg3: 00000000, A handle passed to either WdfObjectAcquireLock or
WdfObjectReleaseLock represents an object that does not
support synchronization locks.
Arg4: 8811d008, Reserved.
// crash line
void WmProviderRead(WM_PROVIDER *pProvider, WDFREQUEST Request)
{
WM_REGISTRATION *reg;
NTSTATUS status;
WM_IO_MAD *wmad;
size_t outlen, len = 0;
status = WdfRequestRetrieveOutputBuffer(Request, sizeof(WM_IO_MAD), &wmad, &outlen);
if (!NT_SUCCESS(status)) {
goto out;
}
WdfObjectAcquireLock(pProvider->ReadQueue);
if (pProvider->MadHead == NULL) {
// Local parameters
3: kd> ?? pProvider
struct _WM_PROVIDER * 0x894761c0
+0x000 Entry : _LIST_ENTRY [ 0x8e3ea070 - 0x8e3ea070 ]
+0x008 RegIndex : _INDEX_LIST
+0x014 ReadQueue : 0x76157f00 WDFQUEUE__
+0x018 MadHead : (null)
+0x01c MadTail : 0xbc67ccc8 _ib_mad_element
+0x020 Lock : _KGUARDED_MUTEX
+0x040 Ref : 0
+0x044 Event : _KEVENT
+0x054 Pending : 0
+0x058 Active : 0
+0x05c SharedEvent : _KEVENT
+0x06c Exclusive : 0
+0x070 ExclusiveEvent : _KEVENT
// ReadQueue
3: kd> !wdfqueue 0x76157f00
Dumping WDFQUEUE 0x76157f00
=========================
Manualcouldn't read 00000044
, Deleted, Disposing, Not power-managed, PowerOn, Cannot accept, Can dispatch, Dispatching, ExecutionLevelDispatch, SynchronizationScopeNone
Number of driver owned requests: 0
Number of waiting requests: 0
// opensm
3: kd> !PROCESS a14ba6c0
PROCESS a14ba6c0 SessionId: 1 Cid: 13fc Peb: 7ffd9000 ParentCid: 10fc
DirBase: bd5feee0 ObjectTable: bf7fa7b8 HandleCount: 41.
Image: opensm.exe
VadRoot 89c8ca38 Vads 51 Clone 0 Private 271. Modified 3. Locked 0.
DeviceMap 81f41008
Token a6e81030
ElapsedTime 00:00:00.890
UserTime 00:00:00.015
KernelTime 00:00:00.062
QuotaPoolUsage[PagedPool] 0
QuotaPoolUsage[NonPagedPool] 0
Working Set Sizes (now,min,max) (913, 50, 345) (3652KB, 200KB, 1380KB)
PeakWorkingSetSize 930
VirtualSize 76 Mb
PeakVirtualSize 78 Mb
PageFaultCount 1047
MemoryPriority BACKGROUND
BasePriority 8
CommitCharge 8768
Job 89e04368
THREAD 89e05d48 Cid 13fc.1758 Teb: 7ffdf000 Win32Thread: fe333b18 RUNNING on processor 2
Not impersonating
DeviceMap 81f41008
Owning Process 0 Image: <Unknown>
Attached Process a14ba6c0 Image: opensm.exe
Wait Start TickCount 5900630 Ticks: 1 (0:00:00:00.015)
Context Switch Count 454
UserTime 00:00:00.015
KernelTime 00:00:00.578
Win32 Start Address opensm!mainCRTStartup (0x0042aaf5)
Stack Init adf03fd0 Current adf038d0 Base adf04000 Limit adf01000 Call 0
Priority 11 BasePriority 8 PriorityDecrement 0 IoPriority 2 PagePriority 5
ChildEBP RetAddr
adf03cec 826be3a4 nt!KdPollBreakIn+0xea
adf03cf0 826be381 nt!KdCheckForDebugBreak+0x17 (FPO: [0,0,0])
adf03d24 8261f430 nt!KeUpdateRunTime+0x164
adf03d24 003fd34a hal!HalpClockInterruptPn+0x158 (FPO: [0,2] TrapFrame @ adf03d34)
000b2918 003d6174 opensm!osm_subn_destroy+0x1da (FPO: [Non-Fpo]) (CONV: stdcall) [a:\builds\9565\branches\mlnx_winof-3_0_0\ulp\opensm\user\opensm\osm_subnet.c @ 501]
000b2924 003ba03e opensm!osm_opensm_destroy+0x104 (FPO: [Non-Fpo]) (CONV: stdcall) [a:\builds\9565\branches\mlnx_winof-3_0_0\ulp\opensm\user\opensm\osm_opensm.c @ 313]
000cfda8 003bab21 opensm!opensm_main+0x1c3e (FPO: [Non-Fpo]) (CONV: stdcall) [a:\builds\9565\branches\mlnx_winof-3_0_0\ulp\opensm\user\opensm\main.c @ 1264]
000cfdf4 0042a9c4 opensm!main+0x191 (FPO: [Non-Fpo]) (CONV: cdecl) [a:\builds\9565\branches\mlnx_winof-3_0_0\ulp\opensm\user\opensm\main.c @ 1305]
000cfe38 771c1194 opensm!__mainCRTStartup+0x102 (FPO: [Non-Fpo]) (CONV: cdecl) [d:\5359\minkernel\crts\crtw32\dllstuff\crtexe.c @ 695]
000cfe44 7753b3f5 kernel32!BaseThreadInitThunk+0xe (FPO: [Non-Fpo])
000cfe84 7753b3c8 ntdll!__RtlUserThreadStart+0x70 (FPO: [Non-Fpo])
000cfe9c 00000000 ntdll!_RtlUserThreadStart+0x1b (FPO: [Non-Fpo])
THREAD c8974030 Cid 13fc.16f4 Teb: 7ffd6000 Win32Thread: 00000000 RUNNING on processor 3
IRP List:
89e505b0: (0006,0094) Flags: 00060970 Mdl: 00000000
Not impersonating
DeviceMap 81f41008
Owning Process 0 Image: <Unknown>
Attached Process a14ba6c0 Image: opensm.exe
Wait Start TickCount 5900629 Ticks: 2 (0:00:00:00.031)
Context Switch Count 24
UserTime 00:00:00.000
KernelTime 00:00:00.015
Win32 Start Address complib!cl_thread_callback (0x72f409f0)
Stack Init adf57fd0 Current adf57b44 Base adf58000 Limit adf55000 Call 0
Priority 10 BasePriority 8 PriorityDecrement 2 IoPriority 2 PagePriority 5
ChildEBP RetAddr
adf576d4 8272fe71 nt!RtlpBreakWithStatusInstruction (FPO: [1,0,0])
adf57724 8273096d nt!KiBugCheckDebugBreak+0x1c
adf57ae8 8272fd10 nt!KeBugCheck2+0x68b
adf57b08 8d956f9e nt!KeBugCheckEx+0x1e
adf57b24 8d955829 Wdf01000!FxVerifierBugCheck+0x24 (FPO: [Non-Fpo])
adf57b4c 8e3e6626 Wdf01000!imp_WdfObjectAcquireLock+0x26 (FPO: [Non-Fpo])
adf57b5c 8e3e6a3c winmad!WdfObjectAcquireLock+0x16 (FPO: [Non-Fpo]) (CONV: stdcall) [c:\winddk\7600.16385.1\inc\wdf\kmdf\1.9\wdfsync.h @ 61]
adf57b7c 8e3e5cdd winmad!WmProviderRead+0x3c (FPO: [Non-Fpo]) (CONV: stdcall) [a:\builds\9565\branches\mlnx_winof-3_0_0\core\winmad\kernel\wm_provider.c @ 278]
adf57b94 8d94502a winmad!WmIoRead+0x2d (FPO: [Non-Fpo]) (CONV: stdcall) [a:\builds\9565\branches\mlnx_winof-3_0_0\core\winmad\kernel\wm_driver.c @ 128]
adf57bb0 8d946256 Wdf01000!FxIoQueueIoRead::Invoke+0x2a (FPO: [Non-Fpo])
adf57bd8 8d9489ac Wdf01000!FxIoQueue::DispatchRequestToDriver+0x1a3 (FPO: [Non-Fpo])
adf57bf4 8d949a36 Wdf01000!FxIoQueue::DispatchEvents+0x3be (FPO: [Non-Fpo])
adf57c14 8d94b824 Wdf01000!FxIoQueue::QueueRequest+0x1ec (FPO: [Non-Fpo])
adf57c38 8d93aa3f Wdf01000!FxPkgIo::Dispatch+0x27d (FPO: [Non-Fpo])
adf57c44 8268f4bc Wdf01000!FxDevice::Dispatch+0x7f (FPO: [Non-Fpo])
adf57c5c 82890f6e nt!IofCallDriver+0x63
adf57c7c 828a2f32 nt!IopSynchronousServiceTail+0x1f8
adf57d08 8269644a nt!NtReadFile+0x644
adf57d08 775264f4 nt!KiFastCallEntry+0x12a (FPO: [0,3] TrapFrame @ adf57d34)
007cfb84 7752570c ntdll!KiFastSystemCallRet (FPO: [0,0,0])
007cfb88 757af249 ntdll!ZwReadFile+0xc (FPO: [9,0,0])
007cfbec 771bdafd KERNELBASE!ReadFile+0xaa (FPO: [Non-Fpo])
007cfc34 72fd1a25 kernel32!ReadFileImplementation+0xf0 (FPO: [Non-Fpo])
007cfc58 74272910 winmad_72fd0000!CWMProvider::Receive+0xa5 (FPO: [Non-Fpo]) (CONV: stdcall) [a:\builds\9565\branches\mlnx_winof-3_0_0\core\winmad\user\wm_provider.cpp @ 227]
007cfc7c 0042d76a libibumad!umad_recv+0x50 (FPO: [Non-Fpo]) (CONV: stdcall) [a:\builds\9565\branches\mlnx_winof-3_0_0\ulp\libibumad\src\umad.cpp @ 706]
007cfd0c 72f40a04 opensm!umad_receiver+0xba (FPO: [Non-Fpo]) (CONV: stdcall) [a:\builds\9565\branches\mlnx_winof-3_0_0\ulp\opensm\user\libvendor\osm_vendor_ibumad.c @ 266]
007cfd18 771c1194 complib!cl_thread_callback+0x14 (FPO: [Non-Fpo]) (CONV: stdcall) [a:\builds\9565\branches\mlnx_winof-3_0_0\core\complib\user\cl_thread.c @ 49]
007cfd24 7753b3f5 kernel32!BaseThreadInitThunk+0xe (FPO: [Non-Fpo])
007cfd64 7753b3c8 ntdll!__RtlUserThreadStart+0x70 (FPO: [Non-Fpo])
007cfd7c 00000000 ntdll!_RtlUserThreadStart+0x1b (FPO: [Non-Fpo])
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20120117/948cb823/attachment.html>
More information about the ofw
mailing list