[Openib-windows] IPoIB crash

Anatoly Lisenko anatoly4work at gmail.com
Mon Sep 4 07:15:49 PDT 2006


Hi Fabian,

>From time to time, we see blue screens in IPoIB (Bug Check 0xD1:
DRIVER_IRQL_NOT_LESS_OR_EQUAL).

OpenIB head revision 460.


The call stack is:

nt!KeBugCheckEx
nt!KiBugCheckDispatch+0x74
nt!KiPageFault+0x207
ipoib!cl_fmap_remove_item+0x3c
[d:\projects\win-ibhost\trunk\core\complib\cl_map.c @ 1005]
ipoib!__endpt_mgr_reset_all+0x1e2
[d:\projects\win-ibhost\trunk\ulp\ipoib\kernel\ipoib_port.c @ 4094]
ipoib!ipoib_port_down+0x1fe
[d:\projects\win-ibhost\trunk\ulp\ipoib\kernel\ipoib_port.c @ 5098]
ipoib!__ipoib_pnp_cb+0x3f2
[d:\projects\win-ibhost\trunk\ulp\ipoib\kernel\ipoib_adapter.c @ 615]
ibbus!__pnp_notify_user+0x1a7
[d:\projects\win-ibhost\trunk\core\al\kernel\al_pnp.c @ 525]
ibbus!__pnp_process_remove_port+0x1c2
[d:\projects\win-ibhost\trunk\core\al\kernel\al_pnp.c @ 1004]
ibbus!__pnp_process_remove_ca+0x54
[d:\projects\win-ibhost\trunk\core\al\kernel\al_pnp.c @ 1049]
ibbus!__cl_async_proc_worker+0x73
[d:\projects\win-ibhost\trunk\core\complib\cl_async_proc.c @ 153]
ibbus!__cl_thread_pool_routine+0x4b
[d:\projects\win-ibhost\trunk\core\complib\cl_threadpool.c @ 66]
ibbus!__thread_callback+0x28
[d:\projects\win-ibhost\trunk\core\complib\kernel\cl_thread.c @ 49]
nt!ObAssignSecurity+0x43e
nt!KeInsertQueue+0x2e6


>From looking at code (ipoib_port.c @ 4094)
if( p_endpt->dlid )
{
       cl_qmap_remove_item( &p_port->endpt_mgr.lid_endpts,
&p_endpt->lid_item );
       p_endpt->dlid = 0;
}

>From analyzing the crash dump, I've found that
p_endpt->lid_item->pool_item.list_item->p_next is NULL.

The crash itself happens in the line "p_list_item->p_next->p_prev=
p_list_item->p_prev" in the inline function __cl_primitive_remove() called
from cl_fmap_remove_item()

I've search for unprotected changes of lid_item, and found the following (at
__path_query_cb):

 if( !p_endpt->dlid )
{
            cl_map_item_t   *p_qitem;

            /* This is a subnet local endpoint that does not have its LID
set. */
            p_endpt->dlid = p_path->dlid;

            /*
             * Insert the item in the LID map so that locally routed unicast
             * traffic will resolve it properly.
             */
            cl_obj_lock( &p_port->obj );

            p_qitem = cl_qmap_insert( &p_port->endpt_mgr.lid_endpts,
                                                    p_endpt->dlid,
&p_endpt->lid_item );
            CL_ASSERT( p_qitem == &p_endpt->lid_item );
            cl_obj_unlock( &p_port->obj );
}

What do you say ?
Do we need to lock the reference to p_endpt->dlid with cl_obj_lock/unlock(
&p_endpt->obj ) ?
I'm asking, since I'm seeing it elsewhere in the function.


Thanks,
Anatoly
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20060904/04323484/attachment.html>


More information about the ofw mailing list