[ofw] [PATCH] checked ipoib_ndis6_cm shutdown race
Smith, Stan
stan.smith at intel.com
Mon Sep 20 09:40:56 PDT 2010
Hello,
During system shutdown I have witnessed a checked ipoib_ndis6_cm IO work thread fail:
1) IO work thread is blocked from running due to scheduling priorities beyond the point in time at which port_destroy() wants to delete the port object [cl_obj_destroy( &p_port->obj 0)]. The port object delete fails (ASSERT obj_ref > 0 fires) due to the outstanding port references incurred by remaining posted recv buffers. The 1st 128 WorkRequests have been pulled from the CQ by __recv_cb_internal(), which then posts an IO work request to process the remaining 384 recv work requests. The IO work request does not run prior to port_detroy() being called.
2) The IO thread attempts to run but blows up (BSOD invalid memory reference) as port structures required by the IO work thread have been free()'ed.
The fix is to recognize the port is not in the IB_QPS_RTS state, do not schedule an IO work thread request and continue to pull recv work requests from the CQ until empty.
Code snippets:
} else {
if ( h_cq && p_port->state == IB_QPS_RTS ) {
// increment reference to ensure no one release the object while work iteam is queued
ipoib_port_ref( p_port, ref_recv_cb );
IoQueueWorkItem( p_port->pPoWorkItem, __iopoib_WorkItem, DelayedWorkQueue, p_port);
WorkToDo = FALSE;
} else {
WorkToDo = TRUE;
}
}
__recv_cb(
IN const ib_cq_handle_t h_cq,
IN void *cq_context )
{
uint32_t recv_cnt;
boolean_t WorkToDo;
do
{
WorkToDo = __recv_cb_internal(h_cq, cq_context, &recv_cnt);
} while( WorkToDo );
}
--- A/ulp/ipoib_ndis6_cm/kernel/ipoib_port.cpp Mon Sep 13 15:58:08 2010
+++ B/ulp/ipoib_NDIS6_CM/kernel/ipoib_port.cpp Mon Sep 20 08:47:08 2010
@@ -2222,7 +2222,7 @@
CL_ASSERT( status == IB_SUCCESS );
} else {
- if (h_cq) {
+ if ( h_cq && p_port->state == IB_QPS_RTS ) {
// increment reference to ensure no one release the object while work iteam is queued
ipoib_port_ref( p_port, ref_recv_cb );
IoQueueWorkItem( p_port->pPoWorkItem, __iopoib_WorkItem, DelayedWorkQueue, p_port);
@@ -2244,9 +2244,13 @@
IN const ib_cq_handle_t h_cq,
IN void *cq_context )
{
- uint32_t recv_cnt;
+ uint32_t recv_cnt;
+ boolean_t WorkToDo;
- __recv_cb_internal(h_cq, cq_context, &recv_cnt);
+ do
+ {
+ WorkToDo = __recv_cb_internal(h_cq, cq_context, &recv_cnt);
+ } while( WorkToDo );
}
More information about the ofw
mailing list