[openib-general] ib_cancel_mad and the CM
Sean Hefty
mshefty at ichips.intel.com
Thu Feb 24 09:40:49 PST 2005
Sean Hefty wrote:
> Because ib_cancel_mad might invoke a callback that acquires the cm_id
> lock, the lock cannot be held when ib_cancel_mad is invoked.
>
{snip}
>
> To fix this in the CM, the call to ib_cancel_mad just needs to move
> inside the cm_id lock. Alternatively, it may be possible to change
> ib_cancel_mad to cancel MADs based on a second set of criteria. Both of
> these would require changes to the MAD layer.
Studying the problem more, I believe that this problem exists for both
the CM and SA query code. Unless there is an objection, I'll submit a
patch that will invoke a user's send callback after a MAD has been
canceled from one of the MAD threads, rather than directly from the
user's thread. (Similar to how the process local MAD functionality is
implemented.)
This will allow locking around the cancel routine, which should fix the
problem for the CM code. However, I don't think that locking around
the cancel routine eliminates the issue from the SA query code, but I
also don't see a simple fix in that case.
- Sean
More information about the general
mailing list