[ofw] bugcheck in mlx4_bus

Hefty, Sean sean.hefty at intel.com
Thu Aug 20 10:58:50 PDT 2009


I hit a bugcheck yesterday while running Intel MPI PingPong tests on a single node, scaling up the number of ranks from 2 to 64.  The system is running Server 2003.  A bugcheck analysis suggested adding the following registry value:

HKLM\System\CurrentControlSet\Control\Session Mgr\Memory Mgmt\TrackLockedPages

DWORD with a value of 1

This produced the bugcheck below while re-running the MPI PingPong tests.  I'm running checked drivers with free versions of the libraries.  It's possible this is pointing to a cleanup issue higher in the stack.  I'm trying to find more details.

*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

DRIVER_LEFT_LOCKED_PAGES_IN_PROCESS (cb)
Caused by a driver not cleaning up completely after an I/O.
When possible, the guilty driver's name (Unicode string) is printed on
the bugcheck screen and saved in KiBugCheckDriver.
Arguments:
Arg1: fffffadf8e0ae4f0, The calling address in the driver that locked the pages or if the
	IO manager locked the pages this points to the dispatch routine of
	the top driver on the stack to which the IRP was sent.
Arg2: 0000000000000000, The caller of the calling address in the driver that locked the
	pages. If the IO manager locked the pages this points to the device
	object of the top driver on the stack to which the IRP was sent.
Arg3: fffffadf980c6580, A pointer to the MDL containing the locked pages.
Arg4: 0000000000000021, The number of locked pages.

Debugging Details:
------------------

PEB is paged out (Peb.Ldr = 000007ff`fffda018).  Type ".hh dbgerr001" for details
PEB is paged out (Peb.Ldr = 000007ff`fffda018).  Type ".hh dbgerr001" for details

FAULTING_IP: 
mlx4_bus!register_segment+100 [c:\mshefty\scm\winof\branches\winverbs\hw\mlx4\kernel\bus\core\iobuf.c @ 197]
fffffadf`8e0ae4f0 eb7d            jmp     mlx4_bus!register_segment+0x17f (fffffadf`8e0ae56f)

DEFAULT_BUCKET_ID:  DRIVER_FAULT

BUGCHECK_STR:  0xCB

PROCESS_NAME:  IMB-MPI1.exe

CURRENT_IRQL:  f

LAST_CONTROL_TRANSFER:  from fffff8000107984c to fffff80001026cf0

STACK_TEXT:  
fffffadf`8e16ee28 fffff800`0107984c : 0000fadf`8ee3aa62 00000000`00004cb6 00000000`00000000 00000000`00000000 : nt!RtlpBreakWithStatusInstruction
fffffadf`8e16ee30 fffff800`010c514e : 00000000`04d18000 00000000`dffe0000 00000000`04d18000 fffffadf`9aad51b0 : nt!KdCheckForDebugBreak+0xb5
fffffadf`8e16ee70 fffff800`010d89bb : fffffadf`8e0ae400 00000000`00000000 00000000`00000000 00000000`000000cb : nt!IoWriteCrashDump+0x851
fffffadf`8e16f030 fffff800`0102e994 : fffff6fb`c0000000 fffff6fb`c0000000 fffffadf`988ba440 fffffadf`9b6b9340 : nt!KeBugCheck2+0xb83
fffffadf`8e16f670 fffff800`01096f23 : 00000000`000000cb fffffadf`8e0ae4f0 00000000`00000000 fffffadf`980c6580 : nt!KeBugCheckEx+0x104
fffffadf`8e16f6b0 fffff800`0127381a : fffffa80`01e7b960 fffffadf`8e16fc70 00000000`00000000 fffffadf`988ba440 : nt!MmCleanProcessAddressSpace+0x904
fffffadf`8e16f720 fffff800`0127bb72 : fffffadf`0000007b 00000000`0000007b fffffadf`988ba488 00000000`00000000 : nt!PspExitThread+0xb4d
fffffadf`8e16f9b0 fffff800`01038c30 : 00000000`00000000 fffffadf`8e16fcf0 00000520`657cb7f8 00000000`00000002 : nt!PsExitSpecialApc+0x1d
fffffadf`8e16f9e0 fffff800`01027c3b : 00000000`00000000 fffffadf`8e16fa80 fffff800`0127bdc0 00000000`00000000 : nt!KiDeliverApc+0x504
fffffadf`8e16fa80 fffff800`0102e3f2 : fffffadf`8e16fc18 00000000`00000000 00000000`00000001 fffffadf`9b8c6540 : nt!KiInitiateUserApc+0x7b
fffffadf`8e16fc00 00000000`77ef0a6a : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceExit+0xad
00000000`0012f3a8 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x77ef0a6a


STACK_COMMAND:  .bugcheck ; kb

FOLLOWUP_IP: 
mlx4_bus!register_segment+100 [c:\mshefty\scm\winof\branches\winverbs\hw\mlx4\kernel\bus\core\iobuf.c @ 197]
fffffadf`8e0ae4f0 eb7d            jmp     mlx4_bus!register_segment+0x17f (fffffadf`8e0ae56f)

FAULTING_SOURCE_CODE:  
   193: 	}
   194: 
   195: 	__try { /* try */
   196: 		MmProbeAndLockPages( mdl_p, mode, Operation );   /* lock memory */
>  197: 	} /* try */
   198: 		
   199: 	__except (EXCEPTION_EXECUTE_HANDLER)	{
   200: 		MLX4_PRINT(TRACE_LEVEL_ERROR, MLX4_DBG_MEMORY, 
   201: 			("MOSAL_iobuf_register: Exception 0x%x on MmProbeAndLockPages(), va %I64d, sz %I64d\n", 
   202: 			GetExceptionCode(), va, size));


SYMBOL_NAME:  mlx4_bus!register_segment+100

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: mlx4_bus

IMAGE_NAME:  mlx4_bus.sys

DEBUG_FLR_IMAGE_TIMESTAMP:  4a8d77d7

FAILURE_BUCKET_ID:  X64_0xCB_mlx4_bus!register_segment+100

BUCKET_ID:  X64_0xCB_mlx4_bus!register_segment+100

Followup: MachineOwner
---------




More information about the ofw mailing list