[ofa-general] Re: potential device removal deadlock
Roland Dreier
rdreier at cisco.com
Mon Jan 26 10:13:09 PST 2009
> I'm looking at the rdma_[u]cm modules and how they generate
> DEVICE_REMOVAL events to user applications, and I see a potential
> deadlock. ib_unregister_device() calls the ib_client remove()
> functions in the reverse order from which the ib_clients were
> registered. And if you look at ib_uverbs_remove_one(), you'll see it
> will block until all references from user apps are released. So if
> ib_uverbs remove() gets called _before_ the rdma_cm remove() function,
> then the unregister process will deadlock since applications don't get
> notification of the device removal.
>
> Am I missing something, or is this a bug?
Yes, looks that way. Making sure that rdma_cm is loaded after ib_uverbs
works around it.
> I would think ib_uverbs should actually blow away the kernel parts of
> the user's handles allowing the device to be removed. Then the user
> app will discover things went south on the next down call into the
> uverbs code -or- by the DEVICE_REMOVAL rdma-cm event.
Yes, but that's not that easy (eg need to shoot down mappings of PCI
memory into all userspace processes, etc)... we punted on it when adding
device removal support to uverbs.
- R.
More information about the general
mailing list