[ofa-general] [PATCH] Fix racy deadlock in cma

Kanoj Sarcar kanoj at netxen.com
Wed Oct 3 12:14:39 PDT 2007


Sean Hefty wrote:

>>  - Your comment doesn't make it clear to me that dropping and
>>    reacquiring the lock is safe; can you explain why nothing else
>>    could come along while the lock is dropped and mess things up?
>
>
> I need to study this part in more detail, but I don't think we can 
> safely release the lock without introducing a race in at least 
> cma_listen_on_all().
>

Yes, if you see a race, please point it out, that would help me 
understand this code too.

I will just point out that in the call chain 
rdma_destroy_id():cma_cancel_operation():cma_cancel_listens(), the cmid 
is taken off the listen_any_list first, before the provider calls are made.

Thanks.

Kanoj

>>    It seems rdma_destroy_id() has the same pattern, but it's not clear
>>    to me in the code:
>>
>>     mutex_lock(&lock);
>>     if (id_priv->cma_dev) {
>>         mutex_unlock(&lock);
>>         // why can't the device be hot-unplugged here??
>
>
> The state of the id has been set to destroying, which will cause the 
> device removal code to ignore the id.  Even if device removal occurs 
> before the id state has been set, this should be safe.  A hot-plug 
> event reports the device removal, but waits for the user to destroy 
> the id. The device is only removed from the id by this function, 
> further down.
>
> The locking here is, in part, to prevent attaching a device to the id 
> from a callback while it's being destroyed.  See addr_handler().
>
> - Sean
>




More information about the general mailing list