[openib-general] Re: SPAM: [PATCH] [RFC] - dapl - dat_ep_free() can return without freeing the endpoint

Arlin Davis ardavis at ichips.intel.com
Wed Apr 5 10:24:27 PDT 2006


Sean Hefty wrote:

> James Lentini wrote:
>
>>> void dapli_destroy_conn(struct dapl_cm_id *conn)
>>> {
>>>     int in_callback;
>>> +    struct rdma_cm_id *cm_id;
>>>
>>>     dapl_dbg_log(DAPL_DBG_TYPE_CM,              " destroy_conn: conn 
>>> %p id %d\n",
>>>              conn,conn->cm_id);
>>> -
>>>     dapl_os_lock(&conn->lock);
>>>     conn->destroy = 1;
>>>     in_callback = conn->in_callback;
>>> -    dapl_os_unlock(&conn->lock);
>>> -
>>> -    if (!in_callback) {
>>> -        if (conn->ep)
>>> -            conn->ep->cm_handle = IB_INVALID_HANDLE;
>>> -        if (conn->cm_id) {
>>> -            if (conn->cm_id->qp)
>>> -                rdma_destroy_qp(conn->cm_id);
>>> -            rdma_destroy_id(conn->cm_id);
>>> +    do {
>>> +        if (in_callback) {
>>> +            dapl_os_unlock(&conn->lock);
>>> +            usleep(10);
>>> +            dapl_os_lock(&conn->lock);
>>
>
> In general this doesn't work.  The calling thread may be the callback 
> thread, which would lead to deadlock.  This is why we don't just call 
> rdma_destroy_id() directly, and let it wait for the callback to complete.
>
Sorry, the callback names should be changed since it is really a async 
event processing thread and not a direct callback from CMA. The async 
thread can destroy the cm_id if we no longer hold any cm_id event 
references, we destroy the associated QP, and we are syncronized with 
any other thread that could be destroying at the same time. This is how 
the code currently works.

I did not see the original thread/patch from Steve so I don't have the 
entire context of this issue but it sounds like we need to fix the code 
so that the destroy QP (dat_ep_free) blocks until the event processing 
is complete, always destroy the QP and cm_id from this call, and remove 
cleanup from any async event processing threads. Is this what Steve was 
attempting to do with his patch?

I seemed to have missed the posting of the patch so could someone point 
me to the original patch so I can review and test any changes.

thanks

-arlin




More information about the general mailing list