[openib-general] Re: CMA deadlock

Michael S. Tsirkin mst at mellanox.co.il
Mon Apr 3 11:46:22 PDT 2006


Quoting r. Sean Hefty <sean.hefty at intel.com>:
> Subject: RE: CMA deadlock
> 
> >  A ULP requests address resolution; on success requests route resolution;
> >  route resolution succeeds; inside the callback ULP requests rdma_connect.
> >  Now, a failure (e.g. out of memory) occurs at ULP level and so it decides to
> >  destroy the ID. To this end it returns failure code from the route callback.
> 
> I didn't consider this possibility.  The only solution I can see at the moment
> is to schedule route resolution to a separate thread, as you suggested.

OK. I gather you'll fix it then?

> >And it seems that, if the user callback returns failure, the CMA actually
> >calls rdma_destroy_id which in turn may call ib_destroy_cm_id from inside the
> >CM callback. I think this might deadlock in a similiar way.  Again, bouncing
> >the CM event to the rdma WQ will solve this I think.
> 
> This should be handled by the code.  See the comment near the bottom of the
> cma_ib_handler() routine.

I don't really understand the comment.

        if (ret) {
                /* Destroy the CM ID by returning a non-zero value. */
                conn_id->cm_id.ib = NULL;
                cma_exch(conn_id, CMA_DESTROYING);
                cma_release_remove(conn_id);
                rdma_destroy_id(&conn_id->id);
        }


We seem to be calling rdma_destroy_id, which seems to be calling
ib_destroy_cm_id directly. No?


-- 
Michael S. Tsirkin
Staff Engineer, Mellanox Technologies



More information about the general mailing list