<div>Hi Fabian,<br><br>From time to time, we see blue screens in IPoIB (Bug Check 0xD1: DRIVER_IRQL_NOT_LESS_OR_EQUAL).<br><br>OpenIB head revision 460.<br><br><br>The call stack is:<br><br>nt!KeBugCheckEx<br>nt!KiBugCheckDispatch+0x74
<br>nt!KiPageFault+0x207</div>
<div>ipoib!cl_fmap_remove_item+0x3c [d:\projects\win-ibhost\trunk\core\complib\cl_map.c @ 1005]<br>ipoib!__endpt_mgr_reset_all+0x1e2 [d:\projects\win-ibhost\trunk\ulp\ipoib\kernel\ipoib_port.c @ 4094]<br>ipoib!ipoib_port_down+0x1fe [d:\projects\win-ibhost\trunk\ulp\ipoib\kernel\ipoib_port.c @ 5098]
<br>ipoib!__ipoib_pnp_cb+0x3f2 [d:\projects\win-ibhost\trunk\ulp\ipoib\kernel\ipoib_adapter.c @ 615]<br>ibbus!__pnp_notify_user+0x1a7 [d:\projects\win-ibhost\trunk\core\al\kernel\al_pnp.c @ 525]<br>ibbus!__pnp_process_remove_port+0x1c2 [d:\projects\win-ibhost\trunk\core\al\kernel\al_pnp.c @ 1004]
<br>ibbus!__pnp_process_remove_ca+0x54 [d:\projects\win-ibhost\trunk\core\al\kernel\al_pnp.c @ 1049]<br>ibbus!__cl_async_proc_worker+0x73 [d:\projects\win-ibhost\trunk\core\complib\cl_async_proc.c @ 153]<br>ibbus!__cl_thread_pool_routine+0x4b [d:\projects\win-ibhost\trunk\core\complib\cl_threadpool.c @ 66]
<br>ibbus!__thread_callback+0x28 [d:\projects\win-ibhost\trunk\core\complib\kernel\cl_thread.c @ 49]<br>nt!ObAssignSecurity+0x43e<br>nt!KeInsertQueue+0x2e6<br> <br><br>From looking at code (ipoib_port.c @ 4094)<br>if( p_endpt->dlid )
<br>{<br>       cl_qmap_remove_item( &p_port->endpt_mgr.lid_endpts, &p_endpt->lid_item );<br>       p_endpt->dlid = 0;<br>}<br><br>From analyzing the crash dump, I've found that p_endpt->lid_item->pool_item.list_item->p_next is NULL.
<br><br>The crash itself happens in the line "p_list_item->p_next->p_prev= p_list_item->p_prev" in the inline function __cl_primitive_remove() called from cl_fmap_remove_item()<br><br>I've search for unprotected changes of lid_item, and found the following (at __path_query_cb):
<br><br> if( !p_endpt->dlid )<br>{<br>            cl_map_item_t   *p_qitem;<br><br>            /* This is a subnet local endpoint that does not have its LID set. */<br>            p_endpt->dlid = p_path->dlid;<br>
<br>            /*<br>             * Insert the item in the LID map so that locally routed unicast<br>             * traffic will resolve it properly.<br>             */<br>            cl_obj_lock( &p_port->obj );<br>
<br>            p_qitem = cl_qmap_insert( &p_port->endpt_mgr.lid_endpts,<br>                                                    p_endpt->dlid, &p_endpt->lid_item );<br>            CL_ASSERT( p_qitem == &p_endpt->lid_item );
<br>            cl_obj_unlock( &p_port->obj );<br>}<br><br>What do you say ? <br>Do we need to lock the reference to p_endpt->dlid with cl_obj_lock/unlock( &p_endpt->obj ) ?<br>I'm asking, since I'm seeing it elsewhere in the function.
<br><br> <br>Thanks,<br>Anatoly</div>