[ofa-general] [PATCH] Fix racy deadlock in cma

Sean Hefty mshefty at ichips.intel.com
Wed Oct 3 11:47:33 PDT 2007


>  - Your comment doesn't make it clear to me that dropping and
>    reacquiring the lock is safe; can you explain why nothing else
>    could come along while the lock is dropped and mess things up?

I need to study this part in more detail, but I don't think we can 
safely release the lock without introducing a race in at least 
cma_listen_on_all().

>    It seems rdma_destroy_id() has the same pattern, but it's not clear
>    to me in the code:
> 
> 	mutex_lock(&lock);
> 	if (id_priv->cma_dev) {
> 		mutex_unlock(&lock);
> 		// why can't the device be hot-unplugged here??

The state of the id has been set to destroying, which will cause the 
device removal code to ignore the id.  Even if device removal occurs 
before the id state has been set, this should be safe.  A hot-plug event 
reports the device removal, but waits for the user to destroy the id. 
The device is only removed from the id by this function, further down.

The locking here is, in part, to prevent attaching a device to the id 
from a callback while it's being destroyed.  See addr_handler().

- Sean



More information about the general mailing list