[openib-general] [PATCH] RDMA/iwcm: Bugs in cm_conn_req_handler()

Steve Wise swise at opengridcomputing.com
Sat Feb 10 11:23:43 PST 2007


Here is a patch that Tom and I think fixes the race condition Roland
discovered, plus cleans up the issues Krishna attempted to fix in his
first patch.  I'm testing it now with a series of rping tests looping
with random sizes and counts and it seems to work, but the patch needs
more testing and review.  

Krishna, can you review this carefully and also test it and let us know
if you think its good?  The patch is against for-2.6.21 from Roland's
tree.

Roland, can you review this too and verify that it fixes the race
condition?


Krishna: here are comments to your original patch description:


cm_conn_req_handler() :
>         1. Calling destroy_cm_id leaks 3 work 'free' list entries.

I don't think your original patch fixed all places this memory was
leaked.

This has been address in the patch below by creating a function
free_cm_id() that frees the list entries -and- frees the cm_id memory.
It is then called from the 3 places where the cm_id can be freed.

>         2. cm_id is freed up wrongly and not cm_id_priv (though the
>            effect is the same since cm_id is the first element of
>            cm_id_priv, but still a bug if the top level cm_id
> changes).
> 

This is addressed in the patch below since we have a prototyped function
for freeing the cm_id_priv.

>         3. Reject message has to be sent on failure. Tested this
>            without the fix and found the client hangs, waited for
> about
>            20 mins and then did Ctrl-C but the process is unkillable.
> 

The call to iw_cm_reject() is now in destroy_cm_id() and is called based
on the cm_id state.  So whenever a connection is destroyed, if it needs
a rejection sent, it will be sent.

>         4. Setting IWCM_F_CALLBACK_DESTROY on cm_id (child handle)
>            doesn't achieve anything, since checking for
>            IWCM_F_CALLBACK_DESTROY in the parent's flag (in
>            cm_work_handler) means that this will never be true.
> 

It does achieve something.  I think you are failing to consider the fact
that the iWARP provider can have a reference on the cm_id even in the
case where the callback function returns an error thus giving
destruction ownership to the IWCM.  Perhaps the ammasso provider never
does this, but cxgb3 can.  And we haven't put any restrictions on
exactly when the provider _must_ release its reference.  If the provider
_does_ have a reference at this point in the code, then the cm_id will
not be freed, and must be freed when the refcnt finally reaches zero
when the provider removes its reference.  

I wish to clarify this for everyone (and we need this text in
Documentation/infiniband/iwcm.txt IMO):

This design is based on the RDMA_CM and IB_CM behavior.  If the app
issues the destroy via rdma_destroy_cm_id, then we block that thread
until all references are gone.  If the app returns non-zero in a
callback for a given cm_id, then the CM owns destroying the cm_id and
the application is done with it. That's the short of it.  Here's the
long of it:

There are 2 paths for freeing iw_cm memory.

1) the application issues a rdma_destroy_cm_id() which calls
iw_destroy_cm_id().  In this case (and this case only), the thread is
blocked until the refcnt reaches 0, then the thread continues and frees
the memory. 

2) the application returns non zero from a callback function.  In this
case, the IWCM is responsible to destroy the cm_id.  However, the IWCM
_cannot_ block in its event handler thread because this can cause a
deadlock.  A deadlock can occur if the provider has a reference to the
cm_id and needs to post some event before removing the reference.  If
the IWCM were to block awaiting the refcnt to go to zero, it would
deadlock with the provider trying to post the last event before derefing
the cm_id.  So the IWCM_F_CALLBACK_DESTROY bit is used to indicate that
the IWCM owns destroying this.  If, in cm_work_handler(), the refcnt
goes to zero -and- the DESTROY bit is set, then the cm_id can be freed.
If the refcnt doesn't go to zero in that function, then either the
provider still has a reference, or subsequent queued work items have
additional references.  In either case, the cm_id is not freed and
cm_work_handler() keeps chunking through the events and processing them.
Since the cm_id is marked DESTROYING, the events get dropped and the
references released on the cm_id.  Eventually the cm_id will get freed
either in cm_work_handler() -or- in rem_ref() called by the provider it
the provider has the last reference.

So based on the above design, there are 3 places in the code where the
cm_id can be freed:

A) in case 1 above the memory will always be freed in iw_destroy_cm_id()
after the thread is awakened with a refcnt of zero.

B) in case 2 above if the last reference is due to a queued work event
for the iwm.  In this case the memory if freed in cm_work_handler().

C) in case 2 above if the provider has the last reference, then the
cm_id is freed in rem_ref().


I hope this clarifies things.

Here's the proposed patch:

iw_cm_id destruction race condition fixes.

From: Steve Wise <swise at opengridcomputing.com>

Several changes:

- iwcm_deref_id() always wakes up if there's another reference.

- move iw_cm_reject() into destroy_cm_id() to reduce code replication.

- clean up race condition in cm_work_handler().

- create static void free_cm_id() which deallocs the work entries and then
  kfrees the cm_id memory.  This reduces code replication.

Signed-off-by: Steve Wise <swise at opengridcomputing.com>
Signed-off-by: Tom Tucker <tom at opengridcomputing.com>
---

 drivers/infiniband/core/iwcm.c |   48 +++++++++++++++++++++-------------------
 1 files changed, 25 insertions(+), 23 deletions(-)

diff --git a/drivers/infiniband/core/iwcm.c b/drivers/infiniband/core/iwcm.c
index 1039ad5..403daed 100644
--- a/drivers/infiniband/core/iwcm.c
+++ b/drivers/infiniband/core/iwcm.c
@@ -146,6 +146,12 @@ static int copy_private_data(struct iw_c
 	return 0;
 }
 
+static void free_cm_id(struct iwcm_id_private *cm_id_priv)
+{
+	dealloc_work_entries(cm_id_priv);
+	kfree(cm_id_priv);
+}
+
 /*
  * Release a reference on cm_id. If the last reference is being
  * released, enable the waiting thread (in iw_destroy_cm_id) to
@@ -153,21 +159,14 @@ static int copy_private_data(struct iw_c
  */
 static int iwcm_deref_id(struct iwcm_id_private *cm_id_priv)
 {
-	int ret = 0;
-
 	BUG_ON(atomic_read(&cm_id_priv->refcount)==0);
 	if (atomic_dec_and_test(&cm_id_priv->refcount)) {
 		BUG_ON(!list_empty(&cm_id_priv->work_list));
-		if (waitqueue_active(&cm_id_priv->destroy_comp.wait)) {
-			BUG_ON(cm_id_priv->state != IW_CM_STATE_DESTROYING);
-			BUG_ON(test_bit(IWCM_F_CALLBACK_DESTROY,
-					&cm_id_priv->flags));
-			ret = 1;
-		}
 		complete(&cm_id_priv->destroy_comp);
+		return 1;
 	}
 
-	return ret;
+	return 0;
 }
 
 static void add_ref(struct iw_cm_id *cm_id)
@@ -181,7 +180,11 @@ static void rem_ref(struct iw_cm_id *cm_
 {
 	struct iwcm_id_private *cm_id_priv;
 	cm_id_priv = container_of(cm_id, struct iwcm_id_private, id);
-	iwcm_deref_id(cm_id_priv);
+	if (iwcm_deref_id(cm_id_priv) &&
+	    test_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags)) {
+		BUG_ON(!list_empty(&cm_id_priv->work_list));
+		free_cm_id(cm_id_priv);
+	}
 }
 
 static int cm_event_handler(struct iw_cm_id *cm_id, struct iw_cm_event *event);
@@ -355,8 +358,11 @@ static void destroy_cm_id(struct iw_cm_i
 	case IW_CM_STATE_CONN_RECV:
 		/*
 		 * App called destroy before/without calling accept after
-		 * receiving connection request event notification.
+		 * receiving connection request event notification or
+		 * returned non zero from the event callback function.
+		 * In either case, must tell the provider to reject.
 		 */
+		iw_cm_reject(cm_id, NULL, 0);
 		cm_id_priv->state = IW_CM_STATE_DESTROYING;
 		break;
 	case IW_CM_STATE_CONN_SENT:
@@ -391,9 +397,7 @@ void iw_destroy_cm_id(struct iw_cm_id *c
 
 	wait_for_completion(&cm_id_priv->destroy_comp);
 
-	dealloc_work_entries(cm_id_priv);
-
-	kfree(cm_id_priv);
+	free_cm_id(cm_id_priv);
 }
 EXPORT_SYMBOL(iw_destroy_cm_id);
 
@@ -639,7 +643,6 @@ static void cm_conn_req_handler(struct i
 
 	ret = alloc_work_entries(cm_id_priv, 3);
 	if (ret) {
-		iw_cm_reject(cm_id, NULL, 0);
 		iw_destroy_cm_id(cm_id);
 		goto out;
 	}
@@ -650,7 +653,7 @@ static void cm_conn_req_handler(struct i
 		set_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags);
 		destroy_cm_id(cm_id);
 		if (atomic_read(&cm_id_priv->refcount)==0)
-			kfree(cm_id);
+			free_cm_id(cm_id_priv);
 	}
 
 out:
@@ -854,13 +857,12 @@ static void cm_work_handler(struct work_
 			destroy_cm_id(&cm_id_priv->id);
 		}
 		BUG_ON(atomic_read(&cm_id_priv->refcount)==0);
-		if (iwcm_deref_id(cm_id_priv))
-			return;
-
-		if (atomic_read(&cm_id_priv->refcount)==0 &&
-		    test_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags)) {
-			dealloc_work_entries(cm_id_priv);
-			kfree(cm_id_priv);
+		if (iwcm_deref_id(cm_id_priv)) {
+			if (test_bit(IWCM_F_CALLBACK_DESTROY,
+				     &cm_id_priv->flags)) {
+				BUG_ON(!list_empty(&cm_id_priv->work_list));
+				free_cm_id(cm_id_priv);
+			}
 			return;
 		}
 		spin_lock_irqsave(&cm_id_priv->lock, flags);






More information about the general mailing list