[ofw] [PATCH] checked ipoib_ndis6_cm shutdown race
Smith, Stan
stan.smith at intel.com
Wed Sep 22 09:21:59 PDT 2010
Hello,
Any thoughts on this patch?
Smith, Stan wrote:
> Hello,
> During system shutdown I have witnessed a checked ipoib_ndis6_cm IO
> work thread fail:
>
> 1) IO work thread is blocked from running due to scheduling
> priorities beyond the point in time at which port_destroy() wants to
> delete the port object [cl_obj_destroy( &p_port->obj 0)]. The port
> object delete fails (ASSERT obj_ref > 0 fires) due to the outstanding
> port references incurred by remaining posted recv buffers. The 1st
> 128 WorkRequests have been pulled from the CQ by
> __recv_cb_internal(), which then posts an IO work request to process
> the remaining 384 recv work requests. The IO work request does not
> run prior to port_detroy() being called.
>
> 2) The IO thread attempts to run but blows up (BSOD invalid memory
> reference) as port structures required by the IO work thread have
> been free()'ed.
>
> The fix is to recognize the port is not in the IB_QPS_RTS state, do
> not schedule an IO work thread request and continue to pull recv work
> requests from the CQ until empty.
>
> Code snippets:
> } else {
> if ( h_cq && p_port->state == IB_QPS_RTS ) {
> // increment reference to ensure no one release the object while
> work iteam is queued ipoib_port_ref( p_port, ref_recv_cb );
> IoQueueWorkItem( p_port->pPoWorkItem, __iopoib_WorkItem,
> DelayedWorkQueue, p_port); WorkToDo = FALSE;
> } else {
> WorkToDo = TRUE;
> }
> }
>
> __recv_cb(
> IN const ib_cq_handle_t h_cq,
> IN void *cq_context )
> {
> uint32_t recv_cnt;
> boolean_t WorkToDo;
>
> do
> {
> WorkToDo = __recv_cb_internal(h_cq, cq_context, &recv_cnt);
> } while( WorkToDo );
> }
>
>
> --- A/ulp/ipoib_ndis6_cm/kernel/ipoib_port.cpp Mon Sep 13 15:58:08
> 2010 +++ B/ulp/ipoib_NDIS6_CM/kernel/ipoib_port.cpp Mon Sep 20
> 08:47:08 2010 @@ -2222,7 +2222,7 @@
> CL_ASSERT( status == IB_SUCCESS );
>
> } else {
> - if (h_cq) {
> + if ( h_cq && p_port->state == IB_QPS_RTS ) {
> // increment reference to ensure no one release the object while
> work iteam is queued ipoib_port_ref( p_port, ref_recv_cb );
> IoQueueWorkItem( p_port->pPoWorkItem, __iopoib_WorkItem,
> DelayedWorkQueue, p_port); @@ -2244,9 +2244,13 @@
> IN const ib_cq_handle_t h_cq,
> IN void *cq_context )
> {
> - uint32_t recv_cnt;
> + uint32_t recv_cnt;
> + boolean_t WorkToDo;
>
> - __recv_cb_internal(h_cq, cq_context, &recv_cnt);
> + do
> + {
> + WorkToDo = __recv_cb_internal(h_cq, cq_context, &recv_cnt);
> + } while( WorkToDo );
> }
More information about the ofw
mailing list