[Openib-windows] crash on ibbus

Yossi Leybovich sleybo at mellanox.co.il
Sun May 28 00:56:33 PDT 2006


Hi Fab
 
Below I attached crash dump report we got from our regression system.
I notice that p_mad_wr in spl_qp_svc_send is 0x000001 (I guess its not
valid mad pointer)
This cause the mthca to crash while checking the av.
 
Do you have any idea what can cause the to_send_queue list of the
special qp return p_mad_wr = 1?
I thought about missing lock while inserting/destroying but could not
find any missing lock 
 
10x
Yossi 
 
 
 
0: kd> !analyze -v 
************************************************************************
*******
*
*
*                        Bugcheck Analysis
*
*
*
************************************************************************
*******
 
DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
An attempt was made to access a pageable (or completely invalid) address
at an
interrupt request level (IRQL) that is too high.  This is usually
caused by drivers using improper addresses.
If kernel debugger is available get stack backtrace.
Arguments:
Arg1: 00000014, memory referenced
Arg2: 00000002, IRQL
Arg3: 00000000, value 0 = read operation, 1 = write operation
Arg4: b9c9fb98, address which referenced memory
 
Debugging Details:
------------------
 
***** Kernel symbols are WRONG. Please fix symbols to do analysis.
 
************************************************************************
*
***
***
***
***
***    Your debugger is not using the correct symbols
***
***
***
***    In order for this command to work properly, your symbol path
***
***    must point to .pdb files that have full type information.
***
***
***
***    Certain .pdb files (such as the public OS symbols) do not
***
***    contain the required information.  Contact the group that
***
***    provided you with these symbols if you need this command to
***
***    work.
***
***
***
***    Type referenced: nt!_KPRCB
***
***
***
************************************************************************
*
 
MODULE_NAME:  mthca
 
FAULTING_MODULE: 80800000 nt
 
DEBUG_FLR_IMAGE_TIMESTAMP:  44770e4b
 
READ_ADDRESS:  00000014 
 
CURRENT_IRQL:  2
 
FAULTING_IP: 
mthca!mthca_ah_grh_present+8
[s:\builds\1362\trunk\hw\mthca\kernel\mthca_av.c @ 177]
b9c9fb98 8b4814           mov     ecx,[eax+0x14]
 
DEFAULT_BUCKET_ID:  DRIVER_FAULT
 
BUGCHECK_STR:  0xD1
 
LAST_CONTROL_TRANSFER:  from b9c9fb98 to 8088bdd3
 
STACK_TEXT:  
WARNING: Stack unwind information not available. Following frames may be
wrong.
ba028b4c b9c9fb98 badb0d00 8718af58 b9c9b9fa nt!Kei386EoiHelper+0x28d3
ba028bc0 b9cbb170 00000000 89e78180 00000000
mthca!mthca_ah_grh_present+0x8
[s:\builds\1362\trunk\hw\mthca\kernel\mthca_av.c @ 177]
ba028bf8 b9cbbdb7 89f45750 89e78008 0000004a mthca!build_mlx_header+0x30
[s:\builds\1362\trunk\hw\mthca\kernel\mthca_qp.c @ 1416]
ba028c88 b9c90e2a 89e78008 8718af58 00000000
mthca!mthca_arbel_post_send+0x477
[s:\builds\1362\trunk\hw\mthca\kernel\mthca_qp.c @ 2004]
ba028cb8 b9b8d653 89e78008 ffffffff 8718af58 mthca!mlnx_post_send+0x6a
[s:\builds\1362\trunk\hw\mthca\kernel\hca_direct.c @ 71]
ba028cd4 b9b8d5f2 89fd3df8 ffffffff 8718af58 ibbus!ud_post_send+0x4f
[s:\builds\1362\trunk\core\al\al_qp.c @ 1616]
ba028cec b9b87f8c 89fd3df8 ffffffff 8718af58 ibbus!ib_post_send+0x4c
[s:\builds\1362\trunk\core\al\al_qp.c @ 1588]
ba028d10 b9b88dc5 8a01ac00 8718af40 8718af40 ibbus!remote_mad_send+0x84
[s:\builds\1362\trunk\core\al\kernel\al_smi.c @ 1309]
ba028d2c b9b8d6f4 89fd3df8 ffffffff 00000001 ibbus!spl_qp_svc_send+0x8d
[s:\builds\1362\trunk\core\al\kernel\al_smi.c @ 1249]
ba028d54 b9b884a0 89fd3df8 89fd3f48 8a019a28
ibbus!special_qp_resume_sends+0x4a [s:\builds\1362\trunk\core\al\al_qp.c
@ 1672]
ba028d70 b9b83d91 8a01ad24 8a0199fc 8a019990
ibbus!send_local_mad_cb+0x4c
[s:\builds\1362\trunk\core\al\kernel\al_smi.c @ 1879]
ba028d88 b9b84ae9 8a019990 00000000 89fc5618
ibbus!__cl_async_proc_worker+0x23
[s:\builds\1362\trunk\core\complib\cl_async_proc.c @ 153]
ba028d9c b9b84f2e 8a019990 8a014418 ba028ddc
ibbus!__cl_thread_pool_routine+0x33
[s:\builds\1362\trunk\core\complib\cl_threadpool.c @ 66]
ba028dac 80948bb2 89fc5618 00000000 00000000
ibbus!__thread_callback+0x20
[s:\builds\1362\trunk\core\complib\kernel\cl_thread.c @ 49]
ba028ddc 8088d4d2 b9b84f0e 89fc5618 00000000
nt!PsRemoveCreateThreadNotifyRoutine+0x21e
00000000 00000000 00000000 00000000 00000000
nt!KiDispatchInterrupt+0x572
 

STACK_COMMAND:  .bugcheck ; kb
 
FOLLOWUP_IP: 
mthca!mthca_ah_grh_present+8
[s:\builds\1362\trunk\hw\mthca\kernel\mthca_av.c @ 177]
b9c9fb98 8b4814           mov     ecx,[eax+0x14]
 
FAULTING_SOURCE_CODE:  
   173: }
   174: 
   175: int mthca_ah_grh_present(struct mthca_ah *ah)
   176: {
>  177:  return !!(ah->av->g_slid & 0x80);
   178: }
   179: 
   180: int mthca_read_ah(struct mthca_dev *dev, struct mthca_ah *ah,
   181:     struct ib_ud_header *header)
   182: {
 

SYMBOL_STACK_INDEX:  1
 
FOLLOWUP_NAME:  MachineOwner
 
SYMBOL_NAME:  mthca!mthca_ah_grh_present+8
 
IMAGE_NAME:  mthca.sys
 
BUCKET_ID:  WRONG_SYMBOLS
 
Followup: MachineOwner
---------
 
 
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20060528/b44b157a/attachment.html>


More information about the ofw mailing list