[ofw] [PATCH] checked ipoib_ndis6_cm shutdown race

Smith, Stan stan.smith at intel.com
Mon Sep 20 09:40:56 PDT 2010


Hello,
  During system shutdown I have witnessed a checked ipoib_ndis6_cm IO work thread fail:

1) IO work thread is blocked from running due to scheduling priorities beyond the point in time at which port_destroy() wants to delete the port object [cl_obj_destroy( &p_port->obj 0)]. The port object delete fails (ASSERT obj_ref > 0 fires) due to the outstanding port references incurred by remaining posted recv buffers. The 1st 128 WorkRequests have been pulled from the CQ by __recv_cb_internal(), which then posts an IO work request to process the remaining 384 recv work requests. The IO work request does not run prior to port_detroy() being called.

2) The IO thread attempts to run but blows up (BSOD invalid memory reference) as port structures required by the IO work thread have been free()'ed.

The fix is to recognize the port is not in the IB_QPS_RTS state, do not schedule an IO work thread request and continue to pull recv work requests from the CQ until empty.

Code snippets:
        } else {
                if ( h_cq && p_port->state == IB_QPS_RTS ) {
                        // increment reference to ensure no one release the object while work iteam is queued
                        ipoib_port_ref( p_port, ref_recv_cb );
                        IoQueueWorkItem( p_port->pPoWorkItem, __iopoib_WorkItem, DelayedWorkQueue, p_port);
                        WorkToDo = FALSE;
                } else {
                        WorkToDo = TRUE;
                }
        }

__recv_cb(
        IN              const   ib_cq_handle_t                          h_cq,
        IN                              void                                            *cq_context )
{
        uint32_t        recv_cnt;
        boolean_t       WorkToDo;

        do
        {
                WorkToDo = __recv_cb_internal(h_cq, cq_context, &recv_cnt);
        } while( WorkToDo );
}


--- A/ulp/ipoib_ndis6_cm/kernel/ipoib_port.cpp  Mon Sep 13 15:58:08 2010
+++ B/ulp/ipoib_NDIS6_CM/kernel/ipoib_port.cpp  Mon Sep 20 08:47:08 2010
@@ -2222,7 +2222,7 @@
                CL_ASSERT( status == IB_SUCCESS );

        } else {
-               if (h_cq) {
+               if ( h_cq && p_port->state == IB_QPS_RTS ) {
                        // increment reference to ensure no one release the object while work iteam is queued
                        ipoib_port_ref( p_port, ref_recv_cb );
                        IoQueueWorkItem( p_port->pPoWorkItem, __iopoib_WorkItem, DelayedWorkQueue, p_port);
@@ -2244,9 +2244,13 @@
        IN              const   ib_cq_handle_t                          h_cq,
        IN                              void                                            *cq_context )
 {
-       uint32_t recv_cnt;
+       uint32_t        recv_cnt;
+       boolean_t       WorkToDo;

-       __recv_cb_internal(h_cq, cq_context, &recv_cnt);
+       do
+       {
+               WorkToDo = __recv_cb_internal(h_cq, cq_context, &recv_cnt);
+       } while( WorkToDo );
 }





More information about the ofw mailing list