[openib-general] ib_cancel_mad and the CM

Sean Hefty mshefty at ichips.intel.com
Thu Feb 24 09:40:49 PST 2005


Sean Hefty wrote:
> Because ib_cancel_mad might invoke a callback that acquires the cm_id 
> lock, the lock cannot be held when ib_cancel_mad is invoked.
> 
{snip}
> 
> To fix this in the CM, the call to ib_cancel_mad just needs to move 
> inside the cm_id lock.  Alternatively, it may be possible to change 
> ib_cancel_mad to cancel MADs based on a second set of criteria.  Both of 
> these would require changes to the MAD layer.

Studying the problem more, I believe that this problem exists for both 
the CM and SA query code.  Unless there is an objection, I'll submit a 
patch that will invoke a user's send callback after a MAD has been 
canceled from one of the MAD threads, rather than directly from the 
user's thread.  (Similar to how the process local MAD functionality is 
implemented.)

This will allow locking around the cancel routine, which should fix the 
problem for the CM code.  However, I don't think that locking around 
the cancel routine eliminates the issue from the SA query code, but I 
also don't see a simple fix in that case.

- Sean



More information about the general mailing list