[Openib-windows] "blue screen" on IP over IB code

Tzachi Dar tzachid at mellanox.co.il
Sun Sep 4 08:53:36 PDT 2005


Hi Fab, 
 
While working with our stack, I have received a blue screen. This happened
when the remote machine that was running open sm has received a blue screen
as well (this will probably be the subject of another mail, once I have more
time to debug it).
 
The problem seems to be in the function !__ipoib_ats_reg_cb
[f:\projinf2\wininf\trunk\ulp\ipoib\kernel\ipoib_driver.c @ 2220]
 
On the line that says:            
IPOIB_TRACE( IPOIB_DBG_ERROR,
               ("Port %d OID_GEN_NETWORK_LAYER_ADDRESSES - Failed to
register IP Address "
                       "of %d.%d.%d.%d with error %s\n",
                       port_num,
 
p_reg_svc_rec->svc_rec.service_data8[ATS_IPV4_OFFSET],
 
p_reg_svc_rec->svc_rec.service_data8[ATS_IPV4_OFFSET+1],
 
p_reg_svc_rec->svc_rec.service_data8[ATS_IPV4_OFFSET+2],
 
p_reg_svc_rec->svc_rec.service_data8[ATS_IPV4_OFFSET+3],
                       p_adapter->p_ifc->get_err_str(
p_reg_svc_rec->resp_status )) );
 
The immediate problem is that p_adapter variable is not from the correct
type. This is because there is a cast from p_reg_svc_rec->svc_context to the
type ipoib_adapter_t. 
This is probably a mistake since the type of the p_reg_svc_rec is dapl.
(service name = unsigned char [64] "DAPL Address Translation Service")
 
The p_reg_svc_rec->req_status is IB_TIMEOUT (6).
 
I'm not sure if the error is that a dapl record call back is reaching the
IPOIB or that the IPOIB is not handling it correctly. 
 
Please let me know if you need any more information.
 
Thanks
Tzachi
 
More information: 
0: kd> !analyze -v
****************************************************************************
***
*
*
*                        Bugcheck Analysis
*
*
*
****************************************************************************
***
 
DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
An attempt was made to access a pageable (or completely invalid) address at
an
interrupt request level (IRQL) that is too high.  This is usually
caused by drivers using improper addresses.
If kernel debugger is available get stack backtrace.
Arguments:
Arg1: 1a140157, memory referenced
Arg2: 00000002, IRQL
Arg3: 00000000, value 0 = read operation, 1 = write operation
Arg4: f5581ac5, address which referenced memory
 
Debugging Details:
------------------
 
 
READ_ADDRESS:  1a140157 
 
CURRENT_IRQL:  2
 
FAULTING_IP: 
ipoib!__ipoib_ats_reg_cb+445
[f:\projinf2\wininf\trunk\ulp\ipoib\kernel\ipoib_driver.c @ 2220]
f5581ac5 ff9150010000     call    dword ptr [ecx+0x150]
 
DEFAULT_BUCKET_ID:  DRIVER_FAULT
 
BUGCHECK_STR:  0xD1
 
LAST_CONTROL_TRANSFER:  from f74e7172 to f5581ac5
 
TRAP_FRAME:  808a3374 -- (.trap ffffffff808a3374)
ErrCode = 00000000
eax=81c75498 ebx=f7579170 ecx=1a140007 edx=fd626810 esi=fd6269b8
edi=808a34fe
eip=f5581ac5 esp=808a33e8 ebp=808a3408 iopl=0         nv up ei ng nz na po
nc
cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000
efl=00010286
ipoib!__ipoib_ats_reg_cb+0x445:
f5581ac5 ff9150010000     call    dword ptr [ecx+0x150]
Resetting default scope
 
STACK_TEXT:  
808a3408 f74e7172 808a3438 ffdffa40 81cb73c8 ipoib!__ipoib_ats_reg_cb+0x445
[f:\projinf2\wininf\trunk\ulp\ipoib\kernel\ipoib_driver.c @ 2220]
808a3508 f7563c35 fd6268c8 00000000 80000000 ibal!reg_svc_req_cb+0x2d2
[f:\projinf2\wininf\trunk\core\al\al_reg_svc.c @ 160]
808a3528 f74c5791 81cb72b0 ffffffff 8134f280 ibal!sa_req_send_comp_cb+0x235
[f:\projinf2\wininf\trunk\core\al\kernel\al_sa_req.c @ 730]
808a358c f74c46c0 81cb72b0 ffffffff 808a35a8 ibal!__check_send_queue+0xb51
[f:\projinf2\wininf\trunk\core\al\al_mad.c @ 3080]
808a359c f7579193 81cb72b0 808a3600 8083eb0f ibal!__send_timer_cb+0x90
[f:\projinf2\wininf\trunk\core\al\al_mad.c @ 2948]
808a35a8 8083eb0f 81cb73c8 81cb73a0 96ce6c8c ibal!__timer_callback+0x23
[f:\projinf2\wininf\trunk\core\complib\kernel\cl_timer.c @ 51]
808a3600 8083ac1f 00000000 0000000e 00000000 nt!KiRetireDpcList+0xca
 
 
FOLLOWUP_IP: 
ipoib!__ipoib_ats_reg_cb+445
[f:\projinf2\wininf\trunk\ulp\ipoib\kernel\ipoib_driver.c @ 2220]
f5581ac5 ff9150010000     call    dword ptr [ecx+0x150]
 
SYMBOL_STACK_INDEX:  0
 
FOLLOWUP_NAME:  MachineOwner
 
SYMBOL_NAME:  ipoib!__ipoib_ats_reg_cb+445
 
MODULE_NAME:  ipoib
 
IMAGE_NAME:  ipoib.sys
 
DEBUG_FLR_IMAGE_TIMESTAMP:  430dd045
 
STACK_COMMAND:  .trap ffffffff808a3374 ; kb
 
FAILURE_BUCKET_ID:  0xD1_ipoib!__ipoib_ats_reg_cb+445
 
BUCKET_ID:  0xD1_ipoib!__ipoib_ats_reg_cb+445
 
Followup: MachineOwner
---------
 
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20050904/e2f01d33/attachment.html>


More information about the ofw mailing list