[ofa-general] Re: [PATCH 2.6.24] rdma/cm: fix deadlock destroyinglisten requests

Wed Oct 10 16:17:10 PDT 2007

Sean Hefty wrote:

>>* in your patch, I suggest taking out the warning printk from
>>cma_listen_on_dev() when the listener create attempt fails; it might be
>>that the device is out of resources etc. Since the code takes care of
>>this situation pretty well, I don't see a need for the printk.
>>    
>>
>
>That's easy enough to do.
>
>  
>
>>* I don't see a reason for the internal_id and the device listeners
>>getting a refcount on the wildcard listener. Because, even without
>>these, it is guaranteed that the wildcard listener will exist at least
>>as long as any of the children device listener's are around, by looking
>>at the logic in rdma_destroy_id(). Can you provide some logic for
>>requring this then?
>>    
>>
>
>There are 2 ways to destroy an internal_id: destroying its parent (the wildcard
>listen) or removing its device.  When a device is removed, the internal_id is
>removed from its parent list to ensure that it is only destroyed once.  If the
>parent were to be destroyed at this point, it would destroy any remaining
>children, then be freed.  The internal_id still exists however, and could be
>generating connection request events, which expects to fine the parent.  The
>reference ensures that the parent stays around as long as any children remain.
>  
>

Ok, makes sense.

>  
>
>>* not that I am very worried (and I suggesting resolving this thru
>>another subsequent patch if it is really a problem), but I think device
>>removal is still racy wrt non wildcard listeners. Here's the sequence:
>>cma_process_remove()->cma_remove_id_dev() decides it will
>>rdma_destroy_id() the listener id, and at the same time a process
>>context rdma_destroy_id() decides it is going to do the same. There are
>>probably various ways to take care of this, the simple one might be for
>>rdma_destroy_id() to look at the "state" and make a decision about who
>>gets to destroy.
>>    
>>
>
>A user cannot both return non-zero from their callback (indicating that the
>rdma_cm should destroy the id) and call rdma_destroy_id() on the same id.  This
>is equivalent to call rdma_destroy_id() twice.  It's not too difficult for the
>user to avoid this.
>
>- Sean
>
>  
>
I don't understand your response. ucma.c for example can call 
rdma_create_id() and rdma_destroy_id(), correct? What says that when 
ucma.c does a rdma_destroy_id() on a nonwildcard listener, a device 
removal is not attempting to do the same on the listener? If this is 
possible, the code paths I mentioned above can still trigger a double 
destruct on a listener, correct?

Kanoj