[ofw] [PATCH] checked ipoib_ndis6_cm shutdown race

Smith, Stan stan.smith at intel.com
Wed Sep 22 09:21:59 PDT 2010


Hello,
  Any thoughts on this patch?

Smith, Stan wrote:
> Hello,
>   During system shutdown I have witnessed a checked ipoib_ndis6_cm IO
> work thread fail:
>
> 1) IO work thread is blocked from running due to scheduling
> priorities beyond the point in time at which port_destroy() wants to
> delete the port object [cl_obj_destroy( &p_port->obj 0)]. The port
> object delete fails (ASSERT obj_ref > 0 fires) due to the outstanding
> port references incurred by remaining posted recv buffers. The 1st
> 128 WorkRequests have been pulled from the CQ by
> __recv_cb_internal(), which then posts an IO work request to process
> the remaining 384 recv work requests. The IO work request does not
> run prior to port_detroy() being called.
>
> 2) The IO thread attempts to run but blows up (BSOD invalid memory
> reference) as port structures required by the IO work thread have
> been free()'ed.
>
> The fix is to recognize the port is not in the IB_QPS_RTS state, do
> not schedule an IO work thread request and continue to pull recv work
> requests from the CQ until empty.
>
> Code snippets:
>       } else {
>               if ( h_cq && p_port->state == IB_QPS_RTS ) {
>                       // increment reference to ensure no one release the object while
>                       work iteam is queued ipoib_port_ref( p_port, ref_recv_cb );
>                       IoQueueWorkItem( p_port->pPoWorkItem, __iopoib_WorkItem,
>                       DelayedWorkQueue, p_port); WorkToDo = FALSE;
>               } else {
>                       WorkToDo = TRUE;
>               }
>       }
>
> __recv_cb(
>       IN              const   ib_cq_handle_t                          h_cq,
>       IN                              void                                            *cq_context )
> {
>       uint32_t        recv_cnt;
>       boolean_t       WorkToDo;
>
>       do
>       {
>               WorkToDo = __recv_cb_internal(h_cq, cq_context, &recv_cnt);
>       } while( WorkToDo );
> }
>
>
> --- A/ulp/ipoib_ndis6_cm/kernel/ipoib_port.cpp        Mon Sep 13 15:58:08
> 2010 +++ B/ulp/ipoib_NDIS6_CM/kernel/ipoib_port.cpp   Mon Sep 20
> 08:47:08 2010 @@ -2222,7 +2222,7 @@
>               CL_ASSERT( status == IB_SUCCESS );
>
>       } else {
> -             if (h_cq) {
> +             if ( h_cq && p_port->state == IB_QPS_RTS ) {
>                       // increment reference to ensure no one release the object while
>                       work iteam is queued ipoib_port_ref( p_port, ref_recv_cb );
>                       IoQueueWorkItem( p_port->pPoWorkItem, __iopoib_WorkItem,
> DelayedWorkQueue, p_port); @@ -2244,9 +2244,13 @@
>       IN              const   ib_cq_handle_t                          h_cq,
>       IN                              void                                            *cq_context )
>  {
> -     uint32_t recv_cnt;
> +     uint32_t        recv_cnt;
> +     boolean_t       WorkToDo;
>
> -     __recv_cb_internal(h_cq, cq_context, &recv_cnt);
> +     do
> +     {
> +             WorkToDo = __recv_cb_internal(h_cq, cq_context, &recv_cnt);
> +     } while( WorkToDo );
>  }




More information about the ofw mailing list