[ofw] crash while disabling HCA on multihome machine
Leonid Keller
leonid at mellanox.co.il
Thu Mar 5 07:16:47 PST 2009
Crash:
BugCheck 18, {bad0b0b0, fffffa800a3f7a90, 2, ffffffffffffffff}
The reference count of an object is illegal for the current state of the
object.
Setup:
Two HCAs, IB full stack + the patch, removing the registration HCA
with IBAL.
The problem doesn't happen without WinVerbs and WinMad.
Reproduce:
1. Disable/enable HCA0.
[ 2. Disable/enable HCA0. ]
3. Disable/enable HCA1.
Quick Analysis:
0: kd> !analyze -v
REFERENCE_BY_POINTER (18)
Arguments:
Arg1: 00000000bad0b0b0, Object type of the object whose reference count
is being lowered
Arg2: fffffa800a3f7a90, Object whose reference count is being lowered
Arg3: 0000000000000002, Reserved
Arg4: ffffffffffffffff, Reserved
The ill-dereferenced object in question is IBBUS.SYS
0: kd> !devobj fffffa800a3f7a90
Device object (fffffa800a3f7a90) is for:
\Driver\ibbus DriverObject fffffa800a3f65d0
The wrong reference is PointerCount
0: kd> !object fffffa800a3f7a90
Object: fffffa800a3f7a90 Type: (bad0b0b0)
ObjectHeader: fffffa800a3f7a60 (old version)
HandleCount: 0 PointerCount: 4294967295 /* it's -1 */
Directory Object: fffffa800a4ab740 Name:
More analysis:
I've got a feeling, that one of the WinVerbs&WinMad references wrong
IBBUS. My guess, it is WinMad.
Do the following.
Reload the machine (with two cards), enter the debugger and look to the
device stacks:
HCA0:
3: kd> !devstack 0xfffffa800a2cc060
!DevObj !DrvObj !DevExt ObjectName
fffffa800a3ede20 \Driver\WinMad fffffa800a3ec390
fffffa800a3ebc70 \Driver\WinVerbs fffffa800a3eadb0
fffffa800a3ea040 \Driver\ibbus fffffa800a3ea190
fffffa800a3e9460 \Driver\mlx4_hca fffffa800a3e95b0
> fffffa800a2cc060 \Driver\mlx4_bus fffffa800a2caec0 00000055
Look at PointerCount of IBBUS0 - it is 2.
3: kd> !object fffffa800a3ea040
Object: fffffa800a3ea040 Type: (fffffa8006a22840) Device
ObjectHeader: fffffa800a3ea010 (old version)
HandleCount: 0 PointerCount: 2
Now, PointerCount of IBBUS1 (IBBUS for HCA1) is 4.
HCA1:
3: kd> !devstack 0xfffffa8008b2b950
!DevObj !DrvObj !DevExt ObjectName
fffffa800a3e9e20 \Driver\WinMad fffffa800a3e5390
fffffa800a3df800 \Driver\WinVerbs fffffa800a3e7570
fffffa800a3e3600 \Driver\ibbus fffffa800a3e3750 ibal
fffffa800a3e4040 \Driver\mlx4_hca fffffa800a3e4190
> fffffa8008b2b950 \Driver\mlx4_bus fffffa8008b2b390 00000054
3: kd> !object fffffa800a3e3600
Object: fffffa800a3e3600 Type: (fffffa8006a22840) Device
ObjectHeader: fffffa800a3e35d0 (old version)
HandleCount: 0 PointerCount: 4
What happens during the reproducing of the crash ?
When you disable HCA0, it decrements IBBUS1' PointerCount to 3.
When you then disable HCA1, IBBUS1' PointerCount becomes -1 and you get
bugcheck 0x0018.
I didn't have time to continue the investigation.
Maybe you can look into it ?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20090305/0fb016d6/attachment.html>
More information about the ofw
mailing list