[openib-general] [PATCH] RDMA/iwcm: Bugs in cm_conn_req_handler()
Tom Tucker
tom at opengridcomputing.com
Wed Feb 7 07:01:23 PST 2007
On Wed, 2007-02-07 at 08:24 -0600, Steve Wise wrote:
> This looks good for 2.6.21 IMO.
>
> Acked-by: Steve Wise <swise at opengridcomputing.com>
>
>
> On Wed, 2007-02-07 at 12:26 +0530, Krishna Kumar wrote:
> > (I had submitted this once earlier but got no response)
> >
> > cm_conn_req_handler() :
> > 1. Calling destroy_cm_id leaks 3 work 'free' list entries.
When dealloc_work_entries was added to the iw_destroy_cm_id function, it
needed ALSO to be added everywhere destroy_cm_id was called. So you need
to call dealloc_work_entries everywhere you call destroy_cm_id or this
leak remains all over the place, e.g. cm_work_handler
> > 2. cm_id is freed up wrongly and not cm_id_priv (though the
> > effect is the same since cm_id is the first element of
> > cm_id_priv, but still a bug if the top level cm_id changes).
> > 3. Reject message has to be sent on failure. Tested this
> > without the fix and found the client hangs, waited for about
> > 20 mins and then did Ctrl-C but the process is unkillable.
This should be added to the switch statement in destroy_cm_id (not here)
so that it doesn't need to be added everywhere the cm_id is destroyed
when it's in a state that requires a reject.
> > 4. Setting IWCM_F_CALLBACK_DESTROY on cm_id (child handle)
> > doesn't achieve anything, since checking for
> > IWCM_F_CALLBACK_DESTROY in the parent's flag (in
> > cm_work_handler) means that this will never be true.
destroy_cm_id exists to allow cm_id to be destroyed without waiting. If
you're changing it to iw_destroy_cm_id, that may be fine, but all the
setbit/getbit stuff is a side show. You must be certain that
iw_destroy_cm_id can't wait. If it does, you'll shut down the entire
IWCM.
> >
> > All 4 above cases were tested by injecting random error in
> > iw_conn_req_handler() and running rdma_bw/krping, they were
> > confirmed. I added the BUG_ON() to confirm the earlier check
> > for id_priv->refcount==0 should always be true (and could be
> > removed).
> >
> > Patch against 2.6.20
> >
> > Signed-off-by: Krishna Kumar <krkumar2 at in.ibm.com>
> > ---
> > diff -ruNp org/drivers/infiniband/core/iwcm.c new/drivers/infiniband/core/iwcm.c
> > --- org/drivers/infiniband/core/iwcm.c 2007-01-24 10:25:26.000000000 +0530
> > +++ new/drivers/infiniband/core/iwcm.c 2007-01-24 10:25:31.000000000 +0530
> > @@ -647,10 +647,9 @@ static void cm_conn_req_handler(struct i
> > /* Call the client CM handler */
> > ret = cm_id->cm_handler(cm_id, iw_event);
> > if (ret) {
> > - set_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags);
> > - destroy_cm_id(cm_id);
> > - if (atomic_read(&cm_id_priv->refcount)==0)
> > - kfree(cm_id);
> > + BUG_ON(atomic_read(&cm_id_priv->refcount) != 1);
> > + iw_cm_reject(cm_id, NULL, 0);
> > + iw_destroy_cm_id(cm_id);
> > }
> >
> > out:
> >
> > _______________________________________________
> > openib-general mailing list
> > openib-general at openib.org
> > http://openib.org/mailman/listinfo/openib-general
> >
> > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> >
>
>
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>
More information about the general
mailing list