[ofa-general] Re: [PATCH 2.6.24] rdma/cm: fix deadlock destroying listen requests
Sean Hefty
mshefty at ichips.intel.com
Tue Oct 9 12:21:09 PDT 2007
> Just so I understand, did you discover problems (maybe preexisting race
> conditions) with my previously posted patch? If yes, please point it
> out, so its easier to review yours; if not, I will assume your patch
> implements a better locking scheme and review it as such.
I tried to explain the issue somewhat in my change commit and code
comments. The issue is synchronizing cleanup of the listen_list with
device removal.
When an RDMA device is added to the system, a new listen request is
added for all wildcard listens. Since the original locking held the
mutex throughout the cleanup of the listen list, it prevented adding
another listen request during that same time.
Similar protection was there for handling device removal. When a device
is removed from the system, all internal listen requests associated with
that device are destroyed. If the associated wildcard listen is also
being destroyed, we need to ensure that we don't try to destroy the same
listen twice.
My patch, like yours, ends up releasing the mutex while cleaning up the
listen_list. I choose to eliminate the cma_destroy_listen() call, and
use rdma_destroy_id() as a single destruction path instead. This keeps
the locking contained to a single function. (I don't like acquiring a
lock in one call and releasing it in another. It puts too much
assumption on the caller.)
What was missing was ensuring that a device removal didn't try to
destroy the same listen request. This is handled by the adding the
list_del*() calls to cma_cancel_listens(). Whichever thread removes the
listening id from the device list is responsible for its destruction.
And because that thread could be the device removal thread, I added a
reference from the per device listen to the wildcard listen.
Hopefully this makes sense.
- Sean
More information about the general
mailing list