[ofa-general] Re: [PATCH 2.6.24] rdma/cm: fix deadlock destroyinglisten requests
Kanoj Sarcar
kanoj at netxen.com
Thu Oct 11 12:21:27 PDT 2007
Sean Hefty wrote:
>>cma_process_remove() -> cma_remove_id_dev() generates the event for
>>device removal. This is ok to do as long as it can be guaranteed that a
>>racing rdma_destroy_id() has not returned back to caller, correct?
>>
>>IE, the caller must be willing to accept device removal events until its
>>rdma_destroy_id() returns.
>>
>>
>
>Correct - rdma_destroy_id() blocks until all callbacks from the rdma_cm have
>completed.
>
>
>
>>If so, why is cma_remove_id_dev() trying so hard to not generate the
>>event when rdma_destroy_id() has gotten to the point of setting
>>CMA_DESTROYING? Could it not just generate the event, happy in the
>>knowledge that the refcount bump done by cma_process_remove() will
>>prevent the rdma_destroy_id() call from returning?
>>
>>
>
>There are two ways for the user to destroy an rdma_cm_id. They can either call
>rdma_destroy_id() directly or return a non-zero value from a callback. In order
>to support the latter, all callbacks to a user on the same rdma_cm_id must be
>serialized, and once the user has returned a non-zero value no further callbacks
>can occur. (Otherwise the user wouldn't know when it was safe to deallocate
>their connection context.)
>
>Since a device removal can occur at any point, the device removal callback must
>be serialized with any other callback in progress. It does this by marking that
>the device has been removed. This prevents any new callbacks from being
>invoked, but a callback may already be in progress. The device removal code
>waits for that callback to complete. After it completes, it needs to see if the
>user wants to destroy the rdma_cm_id - meaning they returned a non-zero value
>from the first callback. If so, then the device removal callback cannot be
>invoked.
>
>One other point is that all event callbacks for a given rdma_cm_id end up being
>serialized by default. Only device removal event requires special handling,
>since that thread can run at any time. If you look at some of the callback
>handlers (named *_handler), you'll see calls to disable/enable remove, which
>provides this serialization.
>
>- Sean
>
>
>
Ok, thanks, I see how CMA_DESTROYING is used to correctly implement the
callback initiated destruct.
I don't understand the reason for callback initiated destruct in the
first place, but thats too off topic ...
With this new information, I will revisit the thread posted at
http://lists.openfabrics.org/pipermail/general/2007-September/040614.html
to see if really the problem being talked about there is non existant.
Kanoj
More information about the general
mailing list