[openib-general] [PATCH] RDMA/iwcm: Bugs in cm_conn_req_handler()
Steve Wise
swise at opengridcomputing.com
Sat Feb 10 12:36:03 PST 2007
ugh.
There is at least one bug in this patch. I cannot call iw_cm_reject()
inside destroy_cm_id() because both functions grab the iw_cm lock...
On Sat, 2007-02-10 at 13:23 -0600, Steve Wise wrote:
> Here is a patch that Tom and I think fixes the race condition Roland
> discovered, plus cleans up the issues Krishna attempted to fix in his
> first patch. I'm testing it now with a series of rping tests looping
> with random sizes and counts and it seems to work, but the patch needs
> more testing and review.
>
> Krishna, can you review this carefully and also test it and let us know
> if you think its good? The patch is against for-2.6.21 from Roland's
> tree.
>
> Roland, can you review this too and verify that it fixes the race
> condition?
>
>
> Krishna: here are comments to your original patch description:
>
>
> cm_conn_req_handler() :
> > 1. Calling destroy_cm_id leaks 3 work 'free' list entries.
>
> I don't think your original patch fixed all places this memory was
> leaked.
>
> This has been address in the patch below by creating a function
> free_cm_id() that frees the list entries -and- frees the cm_id memory.
> It is then called from the 3 places where the cm_id can be freed.
>
> > 2. cm_id is freed up wrongly and not cm_id_priv (though the
> > effect is the same since cm_id is the first element of
> > cm_id_priv, but still a bug if the top level cm_id
> > changes).
> >
>
> This is addressed in the patch below since we have a prototyped function
> for freeing the cm_id_priv.
>
> > 3. Reject message has to be sent on failure. Tested this
> > without the fix and found the client hangs, waited for
> > about
> > 20 mins and then did Ctrl-C but the process is unkillable.
> >
>
> The call to iw_cm_reject() is now in destroy_cm_id() and is called based
> on the cm_id state. So whenever a connection is destroyed, if it needs
> a rejection sent, it will be sent.
>
> > 4. Setting IWCM_F_CALLBACK_DESTROY on cm_id (child handle)
> > doesn't achieve anything, since checking for
> > IWCM_F_CALLBACK_DESTROY in the parent's flag (in
> > cm_work_handler) means that this will never be true.
> >
>
> It does achieve something. I think you are failing to consider the fact
> that the iWARP provider can have a reference on the cm_id even in the
> case where the callback function returns an error thus giving
> destruction ownership to the IWCM. Perhaps the ammasso provider never
> does this, but cxgb3 can. And we haven't put any restrictions on
> exactly when the provider _must_ release its reference. If the provider
> _does_ have a reference at this point in the code, then the cm_id will
> not be freed, and must be freed when the refcnt finally reaches zero
> when the provider removes its reference.
>
> I wish to clarify this for everyone (and we need this text in
> Documentation/infiniband/iwcm.txt IMO):
>
> This design is based on the RDMA_CM and IB_CM behavior. If the app
> issues the destroy via rdma_destroy_cm_id, then we block that thread
> until all references are gone. If the app returns non-zero in a
> callback for a given cm_id, then the CM owns destroying the cm_id and
> the application is done with it. That's the short of it. Here's the
> long of it:
>
> There are 2 paths for freeing iw_cm memory.
>
> 1) the application issues a rdma_destroy_cm_id() which calls
> iw_destroy_cm_id(). In this case (and this case only), the thread is
> blocked until the refcnt reaches 0, then the thread continues and frees
> the memory.
>
> 2) the application returns non zero from a callback function. In this
> case, the IWCM is responsible to destroy the cm_id. However, the IWCM
> _cannot_ block in its event handler thread because this can cause a
> deadlock. A deadlock can occur if the provider has a reference to the
> cm_id and needs to post some event before removing the reference. If
> the IWCM were to block awaiting the refcnt to go to zero, it would
> deadlock with the provider trying to post the last event before derefing
> the cm_id. So the IWCM_F_CALLBACK_DESTROY bit is used to indicate that
> the IWCM owns destroying this. If, in cm_work_handler(), the refcnt
> goes to zero -and- the DESTROY bit is set, then the cm_id can be freed.
> If the refcnt doesn't go to zero in that function, then either the
> provider still has a reference, or subsequent queued work items have
> additional references. In either case, the cm_id is not freed and
> cm_work_handler() keeps chunking through the events and processing them.
> Since the cm_id is marked DESTROYING, the events get dropped and the
> references released on the cm_id. Eventually the cm_id will get freed
> either in cm_work_handler() -or- in rem_ref() called by the provider it
> the provider has the last reference.
>
> So based on the above design, there are 3 places in the code where the
> cm_id can be freed:
>
> A) in case 1 above the memory will always be freed in iw_destroy_cm_id()
> after the thread is awakened with a refcnt of zero.
>
> B) in case 2 above if the last reference is due to a queued work event
> for the iwm. In this case the memory if freed in cm_work_handler().
>
> C) in case 2 above if the provider has the last reference, then the
> cm_id is freed in rem_ref().
>
>
> I hope this clarifies things.
>
> Here's the proposed patch:
>
> iw_cm_id destruction race condition fixes.
>
> From: Steve Wise <swise at opengridcomputing.com>
>
> Several changes:
>
> - iwcm_deref_id() always wakes up if there's another reference.
>
> - move iw_cm_reject() into destroy_cm_id() to reduce code replication.
>
> - clean up race condition in cm_work_handler().
>
> - create static void free_cm_id() which deallocs the work entries and then
> kfrees the cm_id memory. This reduces code replication.
>
> Signed-off-by: Steve Wise <swise at opengridcomputing.com>
> Signed-off-by: Tom Tucker <tom at opengridcomputing.com>
> ---
>
> drivers/infiniband/core/iwcm.c | 48 +++++++++++++++++++++-------------------
> 1 files changed, 25 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/infiniband/core/iwcm.c b/drivers/infiniband/core/iwcm.c
> index 1039ad5..403daed 100644
> --- a/drivers/infiniband/core/iwcm.c
> +++ b/drivers/infiniband/core/iwcm.c
> @@ -146,6 +146,12 @@ static int copy_private_data(struct iw_c
> return 0;
> }
>
> +static void free_cm_id(struct iwcm_id_private *cm_id_priv)
> +{
> + dealloc_work_entries(cm_id_priv);
> + kfree(cm_id_priv);
> +}
> +
> /*
> * Release a reference on cm_id. If the last reference is being
> * released, enable the waiting thread (in iw_destroy_cm_id) to
> @@ -153,21 +159,14 @@ static int copy_private_data(struct iw_c
> */
> static int iwcm_deref_id(struct iwcm_id_private *cm_id_priv)
> {
> - int ret = 0;
> -
> BUG_ON(atomic_read(&cm_id_priv->refcount)==0);
> if (atomic_dec_and_test(&cm_id_priv->refcount)) {
> BUG_ON(!list_empty(&cm_id_priv->work_list));
> - if (waitqueue_active(&cm_id_priv->destroy_comp.wait)) {
> - BUG_ON(cm_id_priv->state != IW_CM_STATE_DESTROYING);
> - BUG_ON(test_bit(IWCM_F_CALLBACK_DESTROY,
> - &cm_id_priv->flags));
> - ret = 1;
> - }
> complete(&cm_id_priv->destroy_comp);
> + return 1;
> }
>
> - return ret;
> + return 0;
> }
>
> static void add_ref(struct iw_cm_id *cm_id)
> @@ -181,7 +180,11 @@ static void rem_ref(struct iw_cm_id *cm_
> {
> struct iwcm_id_private *cm_id_priv;
> cm_id_priv = container_of(cm_id, struct iwcm_id_private, id);
> - iwcm_deref_id(cm_id_priv);
> + if (iwcm_deref_id(cm_id_priv) &&
> + test_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags)) {
> + BUG_ON(!list_empty(&cm_id_priv->work_list));
> + free_cm_id(cm_id_priv);
> + }
> }
>
> static int cm_event_handler(struct iw_cm_id *cm_id, struct iw_cm_event *event);
> @@ -355,8 +358,11 @@ static void destroy_cm_id(struct iw_cm_i
> case IW_CM_STATE_CONN_RECV:
> /*
> * App called destroy before/without calling accept after
> - * receiving connection request event notification.
> + * receiving connection request event notification or
> + * returned non zero from the event callback function.
> + * In either case, must tell the provider to reject.
> */
> + iw_cm_reject(cm_id, NULL, 0);
> cm_id_priv->state = IW_CM_STATE_DESTROYING;
> break;
> case IW_CM_STATE_CONN_SENT:
> @@ -391,9 +397,7 @@ void iw_destroy_cm_id(struct iw_cm_id *c
>
> wait_for_completion(&cm_id_priv->destroy_comp);
>
> - dealloc_work_entries(cm_id_priv);
> -
> - kfree(cm_id_priv);
> + free_cm_id(cm_id_priv);
> }
> EXPORT_SYMBOL(iw_destroy_cm_id);
>
> @@ -639,7 +643,6 @@ static void cm_conn_req_handler(struct i
>
> ret = alloc_work_entries(cm_id_priv, 3);
> if (ret) {
> - iw_cm_reject(cm_id, NULL, 0);
> iw_destroy_cm_id(cm_id);
> goto out;
> }
> @@ -650,7 +653,7 @@ static void cm_conn_req_handler(struct i
> set_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags);
> destroy_cm_id(cm_id);
> if (atomic_read(&cm_id_priv->refcount)==0)
> - kfree(cm_id);
> + free_cm_id(cm_id_priv);
> }
>
> out:
> @@ -854,13 +857,12 @@ static void cm_work_handler(struct work_
> destroy_cm_id(&cm_id_priv->id);
> }
> BUG_ON(atomic_read(&cm_id_priv->refcount)==0);
> - if (iwcm_deref_id(cm_id_priv))
> - return;
> -
> - if (atomic_read(&cm_id_priv->refcount)==0 &&
> - test_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags)) {
> - dealloc_work_entries(cm_id_priv);
> - kfree(cm_id_priv);
> + if (iwcm_deref_id(cm_id_priv)) {
> + if (test_bit(IWCM_F_CALLBACK_DESTROY,
> + &cm_id_priv->flags)) {
> + BUG_ON(!list_empty(&cm_id_priv->work_list));
> + free_cm_id(cm_id_priv);
> + }
> return;
> }
> spin_lock_irqsave(&cm_id_priv->lock, flags);
>
>
>
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>
More information about the general
mailing list