[ofa-general] [PATCH] Fix racy deadlock in cma
Kanoj Sarcar
kanoj at netxen.com
Wed Oct 3 12:14:39 PDT 2007
Sean Hefty wrote:
>> - Your comment doesn't make it clear to me that dropping and
>> reacquiring the lock is safe; can you explain why nothing else
>> could come along while the lock is dropped and mess things up?
>
>
> I need to study this part in more detail, but I don't think we can
> safely release the lock without introducing a race in at least
> cma_listen_on_all().
>
Yes, if you see a race, please point it out, that would help me
understand this code too.
I will just point out that in the call chain
rdma_destroy_id():cma_cancel_operation():cma_cancel_listens(), the cmid
is taken off the listen_any_list first, before the provider calls are made.
Thanks.
Kanoj
>> It seems rdma_destroy_id() has the same pattern, but it's not clear
>> to me in the code:
>>
>> mutex_lock(&lock);
>> if (id_priv->cma_dev) {
>> mutex_unlock(&lock);
>> // why can't the device be hot-unplugged here??
>
>
> The state of the id has been set to destroying, which will cause the
> device removal code to ignore the id. Even if device removal occurs
> before the id state has been set, this should be safe. A hot-plug
> event reports the device removal, but waits for the user to destroy
> the id. The device is only removed from the id by this function,
> further down.
>
> The locking here is, in part, to prevent attaching a device to the id
> from a callback while it's being destroyed. See addr_handler().
>
> - Sean
>
More information about the general
mailing list