[openib-general] ib_cancel_mad and the CM
Sean Hefty
mshefty at ichips.intel.com
Wed Feb 23 12:03:16 PST 2005
During a code review, I've discovered what appears to be an issue
between the CM and a call to ib_cancel_mad, and I think that it's
possible for other MAD clients to run into a similar issue.
Here's CM pseudo-code that leads to the problem:
lock cm_id
wr_id = (unsigned long) current_msg
unlock cm_id
ib_cancel_mad(agent, wr_id)
Because ib_cancel_mad might invoke a callback that acquires the cm_id
lock, the lock cannot be held when ib_cancel_mad is invoked.
The problem comes from the use of the current_msg pointer as the wr_id.
If the MAD completes before ib_cancel_mad can be invoked, the memory
could be freed, reallocated, and re-used with a second cm_id. The call
to ib_cancel_mad above would then cancel the wrong MAD. The actual
chance of this happening in the CM seems highly unlikely, but I believe
that it is possible.
To fix this in the CM, the call to ib_cancel_mad just needs to move
inside the cm_id lock. Alternatively, it may be possible to change
ib_cancel_mad to cancel MADs based on a second set of criteria. Both
of these would require changes to the MAD layer.
If we think that this issue is unique to the CM, then I can try to
figure out some other way of handling this.
- Sean
More information about the general
mailing list