[ofa-general] Re: [PATCH 2.6.24] rdma/cm: fix deadlock destroying listen requests

Sean Hefty mshefty at ichips.intel.com
Tue Oct 9 12:21:09 PDT 2007


> Just so I understand, did you discover problems (maybe preexisting race 
> conditions) with my previously posted patch? If yes, please point it 
> out, so its easier to review yours; if not, I will assume your patch 
> implements a better locking scheme and review it as such.

I tried to explain the issue somewhat in my change commit and code 
comments.  The issue is synchronizing cleanup of the listen_list with 
device removal.

When an RDMA device is added to the system, a new listen request is 
added for all wildcard listens.  Since the original locking held the 
mutex throughout the cleanup of the listen list, it prevented adding 
another listen request during that same time.

Similar protection was there for handling device removal.  When a device 
is removed from the system, all internal listen requests associated with 
that device are destroyed.  If the associated wildcard listen is also 
being destroyed, we need to ensure that we don't try to destroy the same 
listen twice.

My patch, like yours, ends up releasing the mutex while cleaning up the 
listen_list.  I choose to eliminate the cma_destroy_listen() call, and 
use rdma_destroy_id() as a single destruction path instead.  This keeps 
the locking contained to a single function.  (I don't like acquiring a 
lock in one call and releasing it in another.  It puts too much 
assumption on the caller.)

What was missing was ensuring that a device removal didn't try to 
destroy the same listen request.  This is handled by the adding the 
list_del*() calls to cma_cancel_listens().  Whichever thread removes the 
listening id from the device list is responsible for its destruction. 
And because that thread could be the device removal thread, I added a 
reference from the per device listen to the wildcard listen.

Hopefully this makes sense.

- Sean



More information about the general mailing list