<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=us-ascii">
<META content="MSHTML 6.00.2900.2873" name=GENERATOR></HEAD>
<BODY>
<DIV><FONT face=Arial size=2><SPAN class=489092207-28052006>Hi
Fab</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN class=489092207-28052006></SPAN></FONT><FONT
face=Arial size=2><SPAN class=489092207-28052006></SPAN></FONT> </DIV>
<DIV><FONT face=Arial size=2><SPAN class=489092207-28052006>Below I
attached crash dump report we got from our regression
system.</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN class=489092207-28052006>I notice that
p_mad_wr in spl_qp_svc_send is 0x000001 (I guess its not valid mad
pointer)</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN class=489092207-28052006>This cause the mthca
to crash while checking the av.</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN
class=489092207-28052006></SPAN></FONT> </DIV>
<DIV><FONT face=Arial size=2><SPAN class=489092207-28052006>Do you have any idea
what can cause the to_send_queue list of the special qp return p_mad_wr =
1?</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN class=489092207-28052006>I thought about
missing lock while inserting/destroying but could not find any missing lock
</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN
class=489092207-28052006></SPAN></FONT> </DIV>
<DIV><FONT><SPAN class=489092207-28052006></SPAN></FONT><FONT face=Arial
size=2><SPAN class=489092207-28052006>10x</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN class=489092207-28052006>Yossi
</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2>0: kd> !analyze -v
<BR>*******************************************************************************<BR>*
*<BR>*
Bugcheck
Analysis
*<BR>*
*<BR>*******************************************************************************</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=Arial size=2>DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)<BR>An attempt
was made to access a pageable (or completely invalid) address at an<BR>interrupt
request level (IRQL) that is too high. This is usually<BR>caused by
drivers using improper addresses.<BR>If kernel debugger is available get stack
backtrace.<BR>Arguments:<BR>Arg1: 00000014, memory referenced<BR>Arg2: 00000002,
IRQL<BR>Arg3: 00000000, value 0 = read operation, 1 = write operation<BR>Arg4:
b9c9fb98, address which referenced memory</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=Arial size=2>Debugging
Details:<BR>------------------</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=Arial size=2>***** Kernel symbols are WRONG. Please fix symbols
to do analysis.</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=Arial
size=2>*************************************************************************<BR>***
***<BR>***
***<BR>*** Your debugger is not using the correct
symbols
***<BR>***
***<BR>*** In order for this command to work properly, your
symbol path ***<BR>*** must point to .pdb files
that have full type information.
***<BR>***
***<BR>*** Certain .pdb files (such as the public OS symbols)
do not ***<BR>*** contain the
required information. Contact the group that
***<BR>*** provided you with these symbols if you need this
command to ***<BR>***
work.
***<BR>***
***<BR>*** Type referenced:
nt!_KPRCB
***<BR>***
***<BR>*************************************************************************</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=Arial size=2>MODULE_NAME: mthca</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=Arial size=2>FAULTING_MODULE: 80800000 nt</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=Arial size=2>DEBUG_FLR_IMAGE_TIMESTAMP:
44770e4b</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2>READ_ADDRESS: 00000014 </FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=Arial size=2>CURRENT_IRQL: 2</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=Arial size=2>FAULTING_IP: <BR>mthca!mthca_ah_grh_present+8
[s:\builds\1362\trunk\hw\mthca\kernel\mthca_av.c @ 177]<BR>b9c9fb98
8b4814
mov ecx,[eax+0x14]</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=Arial size=2>DEFAULT_BUCKET_ID: DRIVER_FAULT</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=Arial size=2>BUGCHECK_STR: 0xD1</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=Arial size=2>LAST_CONTROL_TRANSFER: from b9c9fb98 to
8088bdd3</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=Arial size=2>STACK_TEXT: <BR>WARNING: Stack unwind
information not available. Following frames may be wrong.<BR>ba028b4c b9c9fb98
badb0d00 8718af58 b9c9b9fa nt!Kei386EoiHelper+0x28d3<BR>ba028bc0 b9cbb170
00000000 89e78180 00000000 mthca!mthca_ah_grh_present+0x8
[s:\builds\1362\trunk\hw\mthca\kernel\mthca_av.c @ 177]<BR>ba028bf8 b9cbbdb7
89f45750 89e78008 0000004a mthca!build_mlx_header+0x30
[s:\builds\1362\trunk\hw\mthca\kernel\mthca_qp.c @ 1416]<BR>ba028c88 b9c90e2a
89e78008 8718af58 00000000 mthca!mthca_arbel_post_send+0x477
[s:\builds\1362\trunk\hw\mthca\kernel\mthca_qp.c @ 2004]<BR>ba028cb8 b9b8d653
89e78008 ffffffff 8718af58 mthca!mlnx_post_send+0x6a
[s:\builds\1362\trunk\hw\mthca\kernel\hca_direct.c @ 71]<BR>ba028cd4 b9b8d5f2
89fd3df8 ffffffff 8718af58 ibbus!ud_post_send+0x4f
[s:\builds\1362\trunk\core\al\al_qp.c @ 1616]<BR>ba028cec b9b87f8c 89fd3df8
ffffffff 8718af58 ibbus!ib_post_send+0x4c [s:\builds\1362\trunk\core\al\al_qp.c
@ 1588]<BR>ba028d10 b9b88dc5 8a01ac00 8718af40 8718af40
ibbus!remote_mad_send+0x84 [s:\builds\1362\trunk\core\al\kernel\al_smi.c @
1309]<BR>ba028d2c b9b8d6f4 89fd3df8 ffffffff 00000001 ibbus!spl_qp_svc_send+0x8d
[s:\builds\1362\trunk\core\al\kernel\al_smi.c @ 1249]<BR>ba028d54 b9b884a0
89fd3df8 89fd3f48 8a019a28 ibbus!special_qp_resume_sends+0x4a
[s:\builds\1362\trunk\core\al\al_qp.c @ 1672]<BR>ba028d70 b9b83d91 8a01ad24
8a0199fc 8a019990 ibbus!send_local_mad_cb+0x4c
[s:\builds\1362\trunk\core\al\kernel\al_smi.c @ 1879]<BR>ba028d88 b9b84ae9
8a019990 00000000 89fc5618 ibbus!__cl_async_proc_worker+0x23
[s:\builds\1362\trunk\core\complib\cl_async_proc.c @ 153]<BR>ba028d9c b9b84f2e
8a019990 8a014418 ba028ddc ibbus!__cl_thread_pool_routine+0x33
[s:\builds\1362\trunk\core\complib\cl_threadpool.c @ 66]<BR>ba028dac 80948bb2
89fc5618 00000000 00000000 ibbus!__thread_callback+0x20
[s:\builds\1362\trunk\core\complib\kernel\cl_thread.c @ 49]<BR>ba028ddc 8088d4d2
b9b84f0e 89fc5618 00000000
nt!PsRemoveCreateThreadNotifyRoutine+0x21e<BR>00000000 00000000 00000000
00000000 00000000 nt!KiDispatchInterrupt+0x572</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV><FONT face=Arial size=2>
<DIV><BR>STACK_COMMAND: .bugcheck ; kb</DIV>
<DIV> </DIV>
<DIV>FOLLOWUP_IP: <BR>mthca!mthca_ah_grh_present+8
[s:\builds\1362\trunk\hw\mthca\kernel\mthca_av.c @ 177]<BR>b9c9fb98
8b4814
mov ecx,[eax+0x14]</DIV>
<DIV> </DIV>
<DIV>FAULTING_SOURCE_CODE: <BR> 173: }<BR> 174:
<BR> 175: int mthca_ah_grh_present(struct mthca_ah
*ah)<BR> 176: {<BR>> 177: return
!!(ah->av->g_slid & 0x80);<BR> 178: }<BR> 179:
<BR> 180: int mthca_read_ah(struct mthca_dev *dev, struct mthca_ah
*ah,<BR> 181: struct ib_ud_header
*header)<BR> 182: {</DIV>
<DIV> </DIV>
<DIV><BR>SYMBOL_STACK_INDEX: 1</DIV>
<DIV> </DIV>
<DIV>FOLLOWUP_NAME: MachineOwner</DIV>
<DIV> </DIV>
<DIV>SYMBOL_NAME: mthca!mthca_ah_grh_present+8</DIV>
<DIV> </DIV>
<DIV>IMAGE_NAME: mthca.sys</DIV>
<DIV> </DIV>
<DIV>BUCKET_ID: WRONG_SYMBOLS</DIV>
<DIV> </DIV>
<DIV>Followup: MachineOwner<BR>---------</DIV>
<DIV> </DIV>
<DIV> </DIV>
<DIV></FONT> </DIV></BODY></HTML>