[openib-general] ib_cancel_mad and the CM

Sean Hefty mshefty at ichips.intel.com
Wed Feb 23 12:03:16 PST 2005


During a code review, I've discovered what appears to be an issue 
between the CM and a call to ib_cancel_mad, and I think that it's 
possible for other MAD clients to run into a similar issue.

Here's CM pseudo-code that leads to the problem:

lock cm_id
wr_id = (unsigned long) current_msg
unlock cm_id
ib_cancel_mad(agent, wr_id)

Because ib_cancel_mad might invoke a callback that acquires the cm_id 
lock, the lock cannot be held when ib_cancel_mad is invoked.

The problem comes from the use of the current_msg pointer as the wr_id. 
  If the MAD completes before ib_cancel_mad can be invoked, the memory 
could be freed, reallocated, and re-used with a second cm_id.  The call 
to ib_cancel_mad above would then cancel the wrong MAD.  The actual 
chance of this happening in the CM seems highly unlikely, but I believe 
that it is possible.

To fix this in the CM, the call to ib_cancel_mad just needs to move 
inside the cm_id lock.  Alternatively, it may be possible to change 
ib_cancel_mad to cancel MADs based on a second set of criteria.  Both 
of these would require changes to the MAD layer.

If we think that this issue is unique to the CM, then I can try to 
figure out some other way of handling this.

- Sean



More information about the general mailing list