[ofa-general] Re: potential device removal deadlock

Roland Dreier rdreier at cisco.com
Mon Jan 26 10:13:09 PST 2009


 > I'm looking at the rdma_[u]cm modules and how they generate
 > DEVICE_REMOVAL events to user applications, and I see a potential
 > deadlock.  ib_unregister_device() calls the ib_client remove()
 > functions in the reverse order from which the ib_clients were
 > registered.  And if you look at ib_uverbs_remove_one(), you'll see it
 > will block until all references from user apps are released.  So if
 > ib_uverbs remove() gets called _before_ the rdma_cm remove() function,
 > then the unregister process will deadlock since applications don't get
 > notification of the device removal.
 > 
 > Am I missing something, or is this a bug? 

Yes, looks that way.  Making sure that rdma_cm is loaded after ib_uverbs
works around it.

 > I would think ib_uverbs should actually blow away the kernel parts of
 > the user's handles allowing the device to be removed.  Then the user
 > app will discover things went south on the next down call into the
 > uverbs code -or- by the DEVICE_REMOVAL rdma-cm event.

Yes, but that's not that easy (eg need to shoot down mappings of PCI
memory into all userspace processes, etc)... we punted on it when adding
device removal support to uverbs.

 - R.



More information about the general mailing list