<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=us-ascii">
<META content="MSHTML 6.00.2900.2873" name=GENERATOR></HEAD>
<BODY>
<DIV><FONT face=Arial size=2><SPAN class=489092207-28052006>Hi 
Fab</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN class=489092207-28052006></SPAN></FONT><FONT 
face=Arial size=2><SPAN class=489092207-28052006></SPAN></FONT> </DIV>
<DIV><FONT face=Arial size=2><SPAN class=489092207-28052006>Below I 
attached crash dump report we got from our regression 
system.</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN class=489092207-28052006>I notice that 
p_mad_wr in spl_qp_svc_send is 0x000001 (I guess its not valid mad 
pointer)</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN class=489092207-28052006>This cause the mthca 
to crash while checking the av.</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN 
class=489092207-28052006></SPAN></FONT> </DIV>
<DIV><FONT face=Arial size=2><SPAN class=489092207-28052006>Do you have any idea 
what can cause the to_send_queue list of the special qp return p_mad_wr = 
1?</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN class=489092207-28052006>I thought about 
missing lock while inserting/destroying but could not find any missing lock 
</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN 
class=489092207-28052006></SPAN></FONT> </DIV>
<DIV><FONT><SPAN class=489092207-28052006></SPAN></FONT><FONT face=Arial 
size=2><SPAN class=489092207-28052006>10x</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN class=489092207-28052006>Yossi 
</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2>0: kd> !analyze -v 
<BR>*******************************************************************************<BR>*                                                                             
*<BR>*                        
Bugcheck 
Analysis                                    
*<BR>*                                                                             
*<BR>*******************************************************************************</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=Arial size=2>DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)<BR>An attempt 
was made to access a pageable (or completely invalid) address at an<BR>interrupt 
request level (IRQL) that is too high.  This is usually<BR>caused by 
drivers using improper addresses.<BR>If kernel debugger is available get stack 
backtrace.<BR>Arguments:<BR>Arg1: 00000014, memory referenced<BR>Arg2: 00000002, 
IRQL<BR>Arg3: 00000000, value 0 = read operation, 1 = write operation<BR>Arg4: 
b9c9fb98, address which referenced memory</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=Arial size=2>Debugging 
Details:<BR>------------------</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=Arial size=2>***** Kernel symbols are WRONG. Please fix symbols 
to do analysis.</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=Arial 
size=2>*************************************************************************<BR>***                                                                   
***<BR>***                                                                   
***<BR>***    Your debugger is not using the correct 
symbols                 
***<BR>***                                                                   
***<BR>***    In order for this command to work properly, your 
symbol path   ***<BR>***    must point to .pdb files 
that have full type information.      
***<BR>***                                                                   
***<BR>***    Certain .pdb files (such as the public OS symbols) 
do not      ***<BR>***    contain the 
required information.  Contact the group that      
***<BR>***    provided you with these symbols if you need this 
command to    ***<BR>***    
work.                                                          
***<BR>***                                                                   
***<BR>***    Type referenced: 
nt!_KPRCB                                     
***<BR>***                                                                   
***<BR>*************************************************************************</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=Arial size=2>MODULE_NAME:  mthca</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=Arial size=2>FAULTING_MODULE: 80800000 nt</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=Arial size=2>DEBUG_FLR_IMAGE_TIMESTAMP:  
44770e4b</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2>READ_ADDRESS:  00000014 </FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=Arial size=2>CURRENT_IRQL:  2</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=Arial size=2>FAULTING_IP: <BR>mthca!mthca_ah_grh_present+8 
[s:\builds\1362\trunk\hw\mthca\kernel\mthca_av.c @ 177]<BR>b9c9fb98 
8b4814           
mov     ecx,[eax+0x14]</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=Arial size=2>DEFAULT_BUCKET_ID:  DRIVER_FAULT</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=Arial size=2>BUGCHECK_STR:  0xD1</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=Arial size=2>LAST_CONTROL_TRANSFER:  from b9c9fb98 to 
8088bdd3</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=Arial size=2>STACK_TEXT:  <BR>WARNING: Stack unwind 
information not available. Following frames may be wrong.<BR>ba028b4c b9c9fb98 
badb0d00 8718af58 b9c9b9fa nt!Kei386EoiHelper+0x28d3<BR>ba028bc0 b9cbb170 
00000000 89e78180 00000000 mthca!mthca_ah_grh_present+0x8 
[s:\builds\1362\trunk\hw\mthca\kernel\mthca_av.c @ 177]<BR>ba028bf8 b9cbbdb7 
89f45750 89e78008 0000004a mthca!build_mlx_header+0x30 
[s:\builds\1362\trunk\hw\mthca\kernel\mthca_qp.c @ 1416]<BR>ba028c88 b9c90e2a 
89e78008 8718af58 00000000 mthca!mthca_arbel_post_send+0x477 
[s:\builds\1362\trunk\hw\mthca\kernel\mthca_qp.c @ 2004]<BR>ba028cb8 b9b8d653 
89e78008 ffffffff 8718af58 mthca!mlnx_post_send+0x6a 
[s:\builds\1362\trunk\hw\mthca\kernel\hca_direct.c @ 71]<BR>ba028cd4 b9b8d5f2 
89fd3df8 ffffffff 8718af58 ibbus!ud_post_send+0x4f 
[s:\builds\1362\trunk\core\al\al_qp.c @ 1616]<BR>ba028cec b9b87f8c 89fd3df8 
ffffffff 8718af58 ibbus!ib_post_send+0x4c [s:\builds\1362\trunk\core\al\al_qp.c 
@ 1588]<BR>ba028d10 b9b88dc5 8a01ac00 8718af40 8718af40 
ibbus!remote_mad_send+0x84 [s:\builds\1362\trunk\core\al\kernel\al_smi.c @ 
1309]<BR>ba028d2c b9b8d6f4 89fd3df8 ffffffff 00000001 ibbus!spl_qp_svc_send+0x8d 
[s:\builds\1362\trunk\core\al\kernel\al_smi.c @ 1249]<BR>ba028d54 b9b884a0 
89fd3df8 89fd3f48 8a019a28 ibbus!special_qp_resume_sends+0x4a 
[s:\builds\1362\trunk\core\al\al_qp.c @ 1672]<BR>ba028d70 b9b83d91 8a01ad24 
8a0199fc 8a019990 ibbus!send_local_mad_cb+0x4c 
[s:\builds\1362\trunk\core\al\kernel\al_smi.c @ 1879]<BR>ba028d88 b9b84ae9 
8a019990 00000000 89fc5618 ibbus!__cl_async_proc_worker+0x23 
[s:\builds\1362\trunk\core\complib\cl_async_proc.c @ 153]<BR>ba028d9c b9b84f2e 
8a019990 8a014418 ba028ddc ibbus!__cl_thread_pool_routine+0x33 
[s:\builds\1362\trunk\core\complib\cl_threadpool.c @ 66]<BR>ba028dac 80948bb2 
89fc5618 00000000 00000000 ibbus!__thread_callback+0x20 
[s:\builds\1362\trunk\core\complib\kernel\cl_thread.c @ 49]<BR>ba028ddc 8088d4d2 
b9b84f0e 89fc5618 00000000 
nt!PsRemoveCreateThreadNotifyRoutine+0x21e<BR>00000000 00000000 00000000 
00000000 00000000 nt!KiDispatchInterrupt+0x572</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV><FONT face=Arial size=2>
<DIV><BR>STACK_COMMAND:  .bugcheck ; kb</DIV>
<DIV> </DIV>
<DIV>FOLLOWUP_IP: <BR>mthca!mthca_ah_grh_present+8 
[s:\builds\1362\trunk\hw\mthca\kernel\mthca_av.c @ 177]<BR>b9c9fb98 
8b4814           
mov     ecx,[eax+0x14]</DIV>
<DIV> </DIV>
<DIV>FAULTING_SOURCE_CODE:  <BR>   173: }<BR>   174: 
<BR>   175: int mthca_ah_grh_present(struct mthca_ah 
*ah)<BR>   176: {<BR>>  177:  return 
!!(ah->av->g_slid & 0x80);<BR>   178: }<BR>   179: 
<BR>   180: int mthca_read_ah(struct mthca_dev *dev, struct mthca_ah 
*ah,<BR>   181:     struct ib_ud_header 
*header)<BR>   182: {</DIV>
<DIV> </DIV>
<DIV><BR>SYMBOL_STACK_INDEX:  1</DIV>
<DIV> </DIV>
<DIV>FOLLOWUP_NAME:  MachineOwner</DIV>
<DIV> </DIV>
<DIV>SYMBOL_NAME:  mthca!mthca_ah_grh_present+8</DIV>
<DIV> </DIV>
<DIV>IMAGE_NAME:  mthca.sys</DIV>
<DIV> </DIV>
<DIV>BUCKET_ID:  WRONG_SYMBOLS</DIV>
<DIV> </DIV>
<DIV>Followup: MachineOwner<BR>---------</DIV>
<DIV> </DIV>
<DIV> </DIV>
<DIV></FONT> </DIV></BODY></HTML>