[Openib-windows] IPoIB crash

Anatoly Lisenko anatolyl at voltaire.com
Sun Sep 3 03:50:06 PDT 2006


Hi Fabian,

 

>From time to time, we see blue screens in IPoIB (Bug Check 0xD1:
DRIVER_IRQL_NOT_LESS_OR_EQUAL).

OpenIB head revision 460.

 

The call stack is:

 

nt!KeBugCheckEx

nt!ZwUnloadKey+0x22a4

nt!ZwUnloadKey+0x12b7

ipoib!cl_fmap_remove_item+0x3c
[d:\projects\win-ibhost\trunk\core\complib\cl_map.c @ 1005]

ipoib!__endpt_mgr_reset_all+0x1e2
[d:\projects\win-ibhost\trunk\ulp\ipoib\kernel\ipoib_port.c @ 4094]

ipoib!ipoib_port_down+0x1fe
[d:\projects\win-ibhost\trunk\ulp\ipoib\kernel\ipoib_port.c @ 5098]

ipoib!__ipoib_pnp_cb+0x3f2
[d:\projects\win-ibhost\trunk\ulp\ipoib\kernel\ipoib_adapter.c @ 615]

ibbus!__pnp_notify_user+0x1a7
[d:\projects\win-ibhost\trunk\core\al\kernel\al_pnp.c @ 525]

ibbus!__pnp_process_remove_port+0x1c2
[d:\projects\win-ibhost\trunk\core\al\kernel\al_pnp.c @ 1004]

ibbus!__pnp_process_remove_ca+0x54
[d:\projects\win-ibhost\trunk\core\al\kernel\al_pnp.c @ 1049]

ibbus!__cl_async_proc_worker+0x73
[d:\projects\win-ibhost\trunk\core\complib\cl_async_proc.c @ 153]

ibbus!__cl_thread_pool_routine+0x4b
[d:\projects\win-ibhost\trunk\core\complib\cl_threadpool.c @ 66]

ibbus!__thread_callback+0x28
[d:\projects\win-ibhost\trunk\core\complib\kernel\cl_thread.c @ 49]

nt!ObAssignSecurity+0x43e

nt!KeInsertQueue+0x2e6

 

>From looking at code (ipoib_port.c @ 4094)

 

if( p_endpt->dlid )

{

cl_qmap_remove_item( &p_port->endpt_mgr.lid_endpts,

                        &p_endpt->lid_item );

            p_endpt->dlid = 0;

}

 

>From analyzing the crash dump, I've found that
p_endpt->lid_item->pool_item.list_item->p_next is NULL.

The crash itself happens in the line "p_list_item->p_next->p_prev=
p_list_item->p_prev" in the inline function __cl_primitive_remove()
called from cl_fmap_remove_item()

 

I've search for unprotected changes of lid_item, and found the following
(at __path_query_cb):

 

if( !p_endpt->dlid )

{

            cl_map_item_t   *p_qitem;

 

            /* This is a subnet local endpoint that does not have its
LID set. */

            p_endpt->dlid = p_path->dlid;

                        /*

             * Insert the item in the LID map so that locally routed
unicast

             * traffic will resolve it properly.

             */

            cl_obj_lock( &p_port->obj );

            p_qitem = cl_qmap_insert( &p_port->endpt_mgr.lid_endpts,

                        p_endpt->dlid, &p_endpt->lid_item );

            CL_ASSERT( p_qitem == &p_endpt->lid_item );

            cl_obj_unlock( &p_port->obj );

}

 

What do you say ? 

 

Do we need to lock the reference to p_endpt->dlid with
cl_obj_lock/unlock( &p_endpt->obj ) ?

I'm asking, since I'm seeing it elsewhere in the function.

 

Thanks,

Anatoly

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20060903/4c7a0f16/attachment.html>


More information about the ofw mailing list