[ofa-general] potential device removal deadlock
Steve Wise
swise at opengridcomputing.com
Mon Jan 26 09:43:14 PST 2009
Hey Roland/Sean,
I'm looking at the rdma_[u]cm modules and how they generate
DEVICE_REMOVAL events to user applications, and I see a potential
deadlock. ib_unregister_device() calls the ib_client remove() functions
in the reverse order from which the ib_clients were registered. And if
you look at ib_uverbs_remove_one(), you'll see it will block until all
references from user apps are released. So if ib_uverbs remove() gets
called _before_ the rdma_cm remove() function, then the unregister
process will deadlock since applications don't get notification of the
device removal.
Am I missing something, or is this a bug?
I would think ib_uverbs should actually blow away the kernel parts of
the user's handles allowing the device to be removed. Then the user app
will discover things went south on the next down call into the uverbs
code -or- by the DEVICE_REMOVAL rdma-cm event.
I'm thinking about all this in the context of EEH handling for cxgb3.
Thoughts?
Thanks,
Steve.
More information about the general
mailing list