[ofa-general] RE: potential device removal deadlock
Sean Hefty
sean.hefty at intel.com
Mon Jan 26 10:30:01 PST 2009
>I'm looking at the rdma_[u]cm modules and how they generate
>DEVICE_REMOVAL events to user applications, and I see a potential
>deadlock. ib_unregister_device() calls the ib_client remove() functions
>in the reverse order from which the ib_clients were registered. And if
>you look at ib_uverbs_remove_one(), you'll see it will block until all
>references from user apps are released. So if ib_uverbs remove() gets
>called _before_ the rdma_cm remove() function, then the unregister
>process will deadlock since applications don't get notification of the
>device removal.
You want the remove device functions called in the reverse order of
registration.
>I would think ib_uverbs should actually blow away the kernel parts of
>the user's handles allowing the device to be removed. Then the user app
>will discover things went south on the next down call into the uverbs
>code -or- by the DEVICE_REMOVAL rdma-cm event.
The ib_ucm and rdma_ucm should also blow away any kernel parts of user handles.
- Sean
More information about the general
mailing list