[Openib-windows] IPoIB crash
Anatoly Lisenko
anatolyl at voltaire.com
Sun Sep 3 03:50:06 PDT 2006
Hi Fabian,
>From time to time, we see blue screens in IPoIB (Bug Check 0xD1:
DRIVER_IRQL_NOT_LESS_OR_EQUAL).
OpenIB head revision 460.
The call stack is:
nt!KeBugCheckEx
nt!ZwUnloadKey+0x22a4
nt!ZwUnloadKey+0x12b7
ipoib!cl_fmap_remove_item+0x3c
[d:\projects\win-ibhost\trunk\core\complib\cl_map.c @ 1005]
ipoib!__endpt_mgr_reset_all+0x1e2
[d:\projects\win-ibhost\trunk\ulp\ipoib\kernel\ipoib_port.c @ 4094]
ipoib!ipoib_port_down+0x1fe
[d:\projects\win-ibhost\trunk\ulp\ipoib\kernel\ipoib_port.c @ 5098]
ipoib!__ipoib_pnp_cb+0x3f2
[d:\projects\win-ibhost\trunk\ulp\ipoib\kernel\ipoib_adapter.c @ 615]
ibbus!__pnp_notify_user+0x1a7
[d:\projects\win-ibhost\trunk\core\al\kernel\al_pnp.c @ 525]
ibbus!__pnp_process_remove_port+0x1c2
[d:\projects\win-ibhost\trunk\core\al\kernel\al_pnp.c @ 1004]
ibbus!__pnp_process_remove_ca+0x54
[d:\projects\win-ibhost\trunk\core\al\kernel\al_pnp.c @ 1049]
ibbus!__cl_async_proc_worker+0x73
[d:\projects\win-ibhost\trunk\core\complib\cl_async_proc.c @ 153]
ibbus!__cl_thread_pool_routine+0x4b
[d:\projects\win-ibhost\trunk\core\complib\cl_threadpool.c @ 66]
ibbus!__thread_callback+0x28
[d:\projects\win-ibhost\trunk\core\complib\kernel\cl_thread.c @ 49]
nt!ObAssignSecurity+0x43e
nt!KeInsertQueue+0x2e6
>From looking at code (ipoib_port.c @ 4094)
if( p_endpt->dlid )
{
cl_qmap_remove_item( &p_port->endpt_mgr.lid_endpts,
&p_endpt->lid_item );
p_endpt->dlid = 0;
}
>From analyzing the crash dump, I've found that
p_endpt->lid_item->pool_item.list_item->p_next is NULL.
The crash itself happens in the line "p_list_item->p_next->p_prev=
p_list_item->p_prev" in the inline function __cl_primitive_remove()
called from cl_fmap_remove_item()
I've search for unprotected changes of lid_item, and found the following
(at __path_query_cb):
if( !p_endpt->dlid )
{
cl_map_item_t *p_qitem;
/* This is a subnet local endpoint that does not have its
LID set. */
p_endpt->dlid = p_path->dlid;
/*
* Insert the item in the LID map so that locally routed
unicast
* traffic will resolve it properly.
*/
cl_obj_lock( &p_port->obj );
p_qitem = cl_qmap_insert( &p_port->endpt_mgr.lid_endpts,
p_endpt->dlid, &p_endpt->lid_item );
CL_ASSERT( p_qitem == &p_endpt->lid_item );
cl_obj_unlock( &p_port->obj );
}
What do you say ?
Do we need to lock the reference to p_endpt->dlid with
cl_obj_lock/unlock( &p_endpt->obj ) ?
I'm asking, since I'm seeing it elsewhere in the function.
Thanks,
Anatoly
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20060903/4c7a0f16/attachment.html>
More information about the ofw
mailing list