[openib-general] [PATCH] RDMA/iwcm: Bugs in cm_conn_req_handler()

Tom Tucker tom at opengridcomputing.com
Wed Feb 7 07:01:23 PST 2007


On Wed, 2007-02-07 at 08:24 -0600, Steve Wise wrote:
> This looks good for 2.6.21 IMO.
> 
> Acked-by: Steve Wise <swise at opengridcomputing.com>
> 
> 
> On Wed, 2007-02-07 at 12:26 +0530, Krishna Kumar wrote:
> > (I had submitted this once earlier but got no response)


> > 
> > cm_conn_req_handler() :
> > 	1. Calling destroy_cm_id leaks 3 work 'free' list entries.

When dealloc_work_entries was added to the iw_destroy_cm_id function, it
needed ALSO to be added everywhere destroy_cm_id was called. So you need
to call dealloc_work_entries everywhere you call destroy_cm_id or this
leak remains all over the place, e.g. cm_work_handler

> > 	2. cm_id is freed up wrongly and not cm_id_priv (though the
> > 	   effect is the same since cm_id is the first element of
> > 	   cm_id_priv, but still a bug if the top level cm_id changes).
> > 	3. Reject message has to be sent on failure. Tested this
> > 	   without the fix and found the client hangs, waited for about
> > 	   20 mins and then did Ctrl-C but the process is unkillable.

This should be added to the switch statement in destroy_cm_id (not here)
so that it doesn't need to be added everywhere the cm_id is destroyed
when it's in a state that requires a reject.

> > 	4. Setting IWCM_F_CALLBACK_DESTROY on cm_id (child handle)
> > 	   doesn't achieve anything, since checking for
> > 	   IWCM_F_CALLBACK_DESTROY in the parent's flag (in
> > 	   cm_work_handler) means that this will never be true.

destroy_cm_id exists to allow cm_id to be destroyed without waiting. If
you're changing it to iw_destroy_cm_id, that may be fine, but all the
setbit/getbit stuff is a side show.  You must be certain that
iw_destroy_cm_id can't wait. If it does, you'll shut down the entire
IWCM.
 
> > 
> > All 4 above cases were tested by injecting random error in
> > iw_conn_req_handler() and running rdma_bw/krping, they were
> > confirmed. I added the BUG_ON() to confirm the earlier check
> > for id_priv->refcount==0 should always be true (and could be
> > removed).
> > 
> > Patch against 2.6.20
> > 
> > Signed-off-by: Krishna Kumar <krkumar2 at in.ibm.com>
> > ---
> > diff -ruNp org/drivers/infiniband/core/iwcm.c new/drivers/infiniband/core/iwcm.c
> > --- org/drivers/infiniband/core/iwcm.c	2007-01-24 10:25:26.000000000 +0530
> > +++ new/drivers/infiniband/core/iwcm.c	2007-01-24 10:25:31.000000000 +0530
> > @@ -647,10 +647,9 @@ static void cm_conn_req_handler(struct i
> >  	/* Call the client CM handler */
> >  	ret = cm_id->cm_handler(cm_id, iw_event);
> >  	if (ret) {
> > -		set_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags);
> > -		destroy_cm_id(cm_id);
> > -		if (atomic_read(&cm_id_priv->refcount)==0)
> > -			kfree(cm_id);
> > +		BUG_ON(atomic_read(&cm_id_priv->refcount) != 1);
> > +		iw_cm_reject(cm_id, NULL, 0);
> > +		iw_destroy_cm_id(cm_id);
> >  	}
> >  
> >  out:
> > 
> > _______________________________________________
> > openib-general mailing list
> > openib-general at openib.org
> > http://openib.org/mailman/listinfo/openib-general
> > 
> > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> > 
> 
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 





More information about the general mailing list