[openib-general] [PATCH] RDMA/iwcm: Bugs in cm_conn_req_handler()

Steve Wise swise at opengridcomputing.com
Sat Feb 10 12:36:03 PST 2007


ugh. 

There is at least one bug in this patch.  I cannot call iw_cm_reject()
inside destroy_cm_id() because both functions grab the iw_cm lock...


On Sat, 2007-02-10 at 13:23 -0600, Steve Wise wrote:
> Here is a patch that Tom and I think fixes the race condition Roland
> discovered, plus cleans up the issues Krishna attempted to fix in his
> first patch.  I'm testing it now with a series of rping tests looping
> with random sizes and counts and it seems to work, but the patch needs
> more testing and review.  
> 
> Krishna, can you review this carefully and also test it and let us know
> if you think its good?  The patch is against for-2.6.21 from Roland's
> tree.
> 
> Roland, can you review this too and verify that it fixes the race
> condition?
> 
> 
> Krishna: here are comments to your original patch description:
> 
> 
> cm_conn_req_handler() :
> >         1. Calling destroy_cm_id leaks 3 work 'free' list entries.
> 
> I don't think your original patch fixed all places this memory was
> leaked.
> 
> This has been address in the patch below by creating a function
> free_cm_id() that frees the list entries -and- frees the cm_id memory.
> It is then called from the 3 places where the cm_id can be freed.
> 
> >         2. cm_id is freed up wrongly and not cm_id_priv (though the
> >            effect is the same since cm_id is the first element of
> >            cm_id_priv, but still a bug if the top level cm_id
> > changes).
> > 
> 
> This is addressed in the patch below since we have a prototyped function
> for freeing the cm_id_priv.
> 
> >         3. Reject message has to be sent on failure. Tested this
> >            without the fix and found the client hangs, waited for
> > about
> >            20 mins and then did Ctrl-C but the process is unkillable.
> > 
> 
> The call to iw_cm_reject() is now in destroy_cm_id() and is called based
> on the cm_id state.  So whenever a connection is destroyed, if it needs
> a rejection sent, it will be sent.
> 
> >         4. Setting IWCM_F_CALLBACK_DESTROY on cm_id (child handle)
> >            doesn't achieve anything, since checking for
> >            IWCM_F_CALLBACK_DESTROY in the parent's flag (in
> >            cm_work_handler) means that this will never be true.
> > 
> 
> It does achieve something.  I think you are failing to consider the fact
> that the iWARP provider can have a reference on the cm_id even in the
> case where the callback function returns an error thus giving
> destruction ownership to the IWCM.  Perhaps the ammasso provider never
> does this, but cxgb3 can.  And we haven't put any restrictions on
> exactly when the provider _must_ release its reference.  If the provider
> _does_ have a reference at this point in the code, then the cm_id will
> not be freed, and must be freed when the refcnt finally reaches zero
> when the provider removes its reference.  
> 
> I wish to clarify this for everyone (and we need this text in
> Documentation/infiniband/iwcm.txt IMO):
> 
> This design is based on the RDMA_CM and IB_CM behavior.  If the app
> issues the destroy via rdma_destroy_cm_id, then we block that thread
> until all references are gone.  If the app returns non-zero in a
> callback for a given cm_id, then the CM owns destroying the cm_id and
> the application is done with it. That's the short of it.  Here's the
> long of it:
> 
> There are 2 paths for freeing iw_cm memory.
> 
> 1) the application issues a rdma_destroy_cm_id() which calls
> iw_destroy_cm_id().  In this case (and this case only), the thread is
> blocked until the refcnt reaches 0, then the thread continues and frees
> the memory. 
> 
> 2) the application returns non zero from a callback function.  In this
> case, the IWCM is responsible to destroy the cm_id.  However, the IWCM
> _cannot_ block in its event handler thread because this can cause a
> deadlock.  A deadlock can occur if the provider has a reference to the
> cm_id and needs to post some event before removing the reference.  If
> the IWCM were to block awaiting the refcnt to go to zero, it would
> deadlock with the provider trying to post the last event before derefing
> the cm_id.  So the IWCM_F_CALLBACK_DESTROY bit is used to indicate that
> the IWCM owns destroying this.  If, in cm_work_handler(), the refcnt
> goes to zero -and- the DESTROY bit is set, then the cm_id can be freed.
> If the refcnt doesn't go to zero in that function, then either the
> provider still has a reference, or subsequent queued work items have
> additional references.  In either case, the cm_id is not freed and
> cm_work_handler() keeps chunking through the events and processing them.
> Since the cm_id is marked DESTROYING, the events get dropped and the
> references released on the cm_id.  Eventually the cm_id will get freed
> either in cm_work_handler() -or- in rem_ref() called by the provider it
> the provider has the last reference.
> 
> So based on the above design, there are 3 places in the code where the
> cm_id can be freed:
> 
> A) in case 1 above the memory will always be freed in iw_destroy_cm_id()
> after the thread is awakened with a refcnt of zero.
> 
> B) in case 2 above if the last reference is due to a queued work event
> for the iwm.  In this case the memory if freed in cm_work_handler().
> 
> C) in case 2 above if the provider has the last reference, then the
> cm_id is freed in rem_ref().
> 
> 
> I hope this clarifies things.
> 
> Here's the proposed patch:
> 
> iw_cm_id destruction race condition fixes.
> 
> From: Steve Wise <swise at opengridcomputing.com>
> 
> Several changes:
> 
> - iwcm_deref_id() always wakes up if there's another reference.
> 
> - move iw_cm_reject() into destroy_cm_id() to reduce code replication.
> 
> - clean up race condition in cm_work_handler().
> 
> - create static void free_cm_id() which deallocs the work entries and then
>   kfrees the cm_id memory.  This reduces code replication.
> 
> Signed-off-by: Steve Wise <swise at opengridcomputing.com>
> Signed-off-by: Tom Tucker <tom at opengridcomputing.com>
> ---
> 
>  drivers/infiniband/core/iwcm.c |   48 +++++++++++++++++++++-------------------
>  1 files changed, 25 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/infiniband/core/iwcm.c b/drivers/infiniband/core/iwcm.c
> index 1039ad5..403daed 100644
> --- a/drivers/infiniband/core/iwcm.c
> +++ b/drivers/infiniband/core/iwcm.c
> @@ -146,6 +146,12 @@ static int copy_private_data(struct iw_c
>  	return 0;
>  }
>  
> +static void free_cm_id(struct iwcm_id_private *cm_id_priv)
> +{
> +	dealloc_work_entries(cm_id_priv);
> +	kfree(cm_id_priv);
> +}
> +
>  /*
>   * Release a reference on cm_id. If the last reference is being
>   * released, enable the waiting thread (in iw_destroy_cm_id) to
> @@ -153,21 +159,14 @@ static int copy_private_data(struct iw_c
>   */
>  static int iwcm_deref_id(struct iwcm_id_private *cm_id_priv)
>  {
> -	int ret = 0;
> -
>  	BUG_ON(atomic_read(&cm_id_priv->refcount)==0);
>  	if (atomic_dec_and_test(&cm_id_priv->refcount)) {
>  		BUG_ON(!list_empty(&cm_id_priv->work_list));
> -		if (waitqueue_active(&cm_id_priv->destroy_comp.wait)) {
> -			BUG_ON(cm_id_priv->state != IW_CM_STATE_DESTROYING);
> -			BUG_ON(test_bit(IWCM_F_CALLBACK_DESTROY,
> -					&cm_id_priv->flags));
> -			ret = 1;
> -		}
>  		complete(&cm_id_priv->destroy_comp);
> +		return 1;
>  	}
>  
> -	return ret;
> +	return 0;
>  }
>  
>  static void add_ref(struct iw_cm_id *cm_id)
> @@ -181,7 +180,11 @@ static void rem_ref(struct iw_cm_id *cm_
>  {
>  	struct iwcm_id_private *cm_id_priv;
>  	cm_id_priv = container_of(cm_id, struct iwcm_id_private, id);
> -	iwcm_deref_id(cm_id_priv);
> +	if (iwcm_deref_id(cm_id_priv) &&
> +	    test_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags)) {
> +		BUG_ON(!list_empty(&cm_id_priv->work_list));
> +		free_cm_id(cm_id_priv);
> +	}
>  }
>  
>  static int cm_event_handler(struct iw_cm_id *cm_id, struct iw_cm_event *event);
> @@ -355,8 +358,11 @@ static void destroy_cm_id(struct iw_cm_i
>  	case IW_CM_STATE_CONN_RECV:
>  		/*
>  		 * App called destroy before/without calling accept after
> -		 * receiving connection request event notification.
> +		 * receiving connection request event notification or
> +		 * returned non zero from the event callback function.
> +		 * In either case, must tell the provider to reject.
>  		 */
> +		iw_cm_reject(cm_id, NULL, 0);
>  		cm_id_priv->state = IW_CM_STATE_DESTROYING;
>  		break;
>  	case IW_CM_STATE_CONN_SENT:
> @@ -391,9 +397,7 @@ void iw_destroy_cm_id(struct iw_cm_id *c
>  
>  	wait_for_completion(&cm_id_priv->destroy_comp);
>  
> -	dealloc_work_entries(cm_id_priv);
> -
> -	kfree(cm_id_priv);
> +	free_cm_id(cm_id_priv);
>  }
>  EXPORT_SYMBOL(iw_destroy_cm_id);
>  
> @@ -639,7 +643,6 @@ static void cm_conn_req_handler(struct i
>  
>  	ret = alloc_work_entries(cm_id_priv, 3);
>  	if (ret) {
> -		iw_cm_reject(cm_id, NULL, 0);
>  		iw_destroy_cm_id(cm_id);
>  		goto out;
>  	}
> @@ -650,7 +653,7 @@ static void cm_conn_req_handler(struct i
>  		set_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags);
>  		destroy_cm_id(cm_id);
>  		if (atomic_read(&cm_id_priv->refcount)==0)
> -			kfree(cm_id);
> +			free_cm_id(cm_id_priv);
>  	}
>  
>  out:
> @@ -854,13 +857,12 @@ static void cm_work_handler(struct work_
>  			destroy_cm_id(&cm_id_priv->id);
>  		}
>  		BUG_ON(atomic_read(&cm_id_priv->refcount)==0);
> -		if (iwcm_deref_id(cm_id_priv))
> -			return;
> -
> -		if (atomic_read(&cm_id_priv->refcount)==0 &&
> -		    test_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags)) {
> -			dealloc_work_entries(cm_id_priv);
> -			kfree(cm_id_priv);
> +		if (iwcm_deref_id(cm_id_priv)) {
> +			if (test_bit(IWCM_F_CALLBACK_DESTROY,
> +				     &cm_id_priv->flags)) {
> +				BUG_ON(!list_empty(&cm_id_priv->work_list));
> +				free_cm_id(cm_id_priv);
> +			}
>  			return;
>  		}
>  		spin_lock_irqsave(&cm_id_priv->lock, flags);
> 
> 
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 





More information about the general mailing list