[ofa-general] RE: potential device removal deadlock

Sean Hefty sean.hefty at intel.com
Mon Jan 26 10:30:01 PST 2009


>I'm looking at the rdma_[u]cm modules and how they generate
>DEVICE_REMOVAL events to user applications, and I see a potential
>deadlock.  ib_unregister_device() calls the ib_client remove() functions
>in the reverse order from which the ib_clients were registered.  And if
>you look at ib_uverbs_remove_one(), you'll see it will block until all
>references from user apps are released.  So if ib_uverbs remove() gets
>called _before_ the rdma_cm remove() function, then the unregister
>process will deadlock since applications don't get notification of the
>device removal.

You want the remove device functions called in the reverse order of
registration.

>I would think ib_uverbs should actually blow away the kernel parts of
>the user's handles allowing the device to be removed.  Then the user app
>will discover things went south on the next down call into the uverbs
>code -or- by the DEVICE_REMOVAL rdma-cm event.

The ib_ucm and rdma_ucm should also blow away any kernel parts of user handles.

- Sean





More information about the general mailing list