[openib-general] [PATCH] RDMA/iwcm: Bugs in cm_conn_req_handler()

Krishna Kumar2 krkumar2 at in.ibm.com
Thu Feb 8 21:01:23 PST 2007


Regarding the race - can this and the other problem (of
using internal data-structure) both be taken care of by
changing iw_deref_id to return 1 if atomic_dec_and_test
finds the last reference ? Then the waitqueue_active()
code can be removed, just do the completion (reaching
here implies that someone is in the middle of
iw_destroy_cm_id).

The question is what is the issue if we return 1 even if
no one is waiting in iw_destroy_cm_id() and which results
in cm_work_handler() returning out ?

thanks,

- KK

> BTW, while looking at iwcm.c, I noticed the following highly dubious
> code for the first time:
>
>    static int iwcm_deref_id(struct iwcm_id_private *cm_id_priv)
>    {
>       int ret = 0;
>
>       BUG_ON(atomic_read(&cm_id_priv->refcount)==0);
>       if (atomic_dec_and_test(&cm_id_priv->refcount)) {
>          BUG_ON(!list_empty(&cm_id_priv->work_list));
>          if (waitqueue_active(&cm_id_priv->destroy_comp.wait)) {
>             BUG_ON(cm_id_priv->state != IW_CM_STATE_DESTROYING);
>             BUG_ON(test_bit(IWCM_F_CALLBACK_DESTROY,
>                   &cm_id_priv->flags));
>             ret = 1;
>          }
>          complete(&cm_id_priv->destroy_comp);
>       }
>
>       return ret;
>    }
>
> The test of waitqueue_active on destroy_comp.wait looks really bad for
> two reasons: first, it is relying on an internal implementation detail
> of struct completion that really shouldn't be used by generic code.
> And second, it seems to me that this doesn't even work right, since
> there is a race something like the following:
>
> iw_destroy_cm_id():
> destroy_cm_id(cm_id); // still 1 ref left
>
>             cm_work_handler():
>                if (iwcm_deref_id()) // drop last ref
>                   return;
>                // no one waiting yet, doesn't
>                // return, but destroy_comp is
>                // signaled
>
> wait_for_completion(&cm_id_priv->destroy_comp);
> // destroy_comp is signaled, proceed
> kfree(cm_id_priv);
>
>                // continue using cm_id_priv
>                // OOPS
>
> I don't understand this code well enough for the fix to be obvious.
>
>  - R.





More information about the general mailing list