[ofa-general] Re: [PATCH 2.6.24] rdma/cm: fix deadlock destroyinglisten requests

Sean Hefty sean.hefty at intel.com
Wed Oct 10 14:59:43 PDT 2007


>* in your patch, I suggest taking out the warning printk from
>cma_listen_on_dev() when the listener create attempt fails; it might be
>that the device is out of resources etc. Since the code takes care of
>this situation pretty well, I don't see a need for the printk.

That's easy enough to do.

>* I don't see a reason for the internal_id and the device listeners
>getting a refcount on the wildcard listener. Because, even without
>these, it is guaranteed that the wildcard listener will exist at least
>as long as any of the children device listener's are around, by looking
>at the logic in rdma_destroy_id(). Can you provide some logic for
>requring this then?

There are 2 ways to destroy an internal_id: destroying its parent (the wildcard
listen) or removing its device.  When a device is removed, the internal_id is
removed from its parent list to ensure that it is only destroyed once.  If the
parent were to be destroyed at this point, it would destroy any remaining
children, then be freed.  The internal_id still exists however, and could be
generating connection request events, which expects to fine the parent.  The
reference ensures that the parent stays around as long as any children remain.

>* not that I am very worried (and I suggesting resolving this thru
>another subsequent patch if it is really a problem), but I think device
>removal is still racy wrt non wildcard listeners. Here's the sequence:
>cma_process_remove()->cma_remove_id_dev() decides it will
>rdma_destroy_id() the listener id, and at the same time a process
>context rdma_destroy_id() decides it is going to do the same. There are
>probably various ways to take care of this, the simple one might be for
>rdma_destroy_id() to look at the "state" and make a decision about who
>gets to destroy.

A user cannot both return non-zero from their callback (indicating that the
rdma_cm should destroy the id) and call rdma_destroy_id() on the same id.  This
is equivalent to call rdma_destroy_id() twice.  It's not too difficult for the
user to avoid this.

- Sean



More information about the general mailing list