[openib-general] RFC on CM error handling

Fri Jan 21 11:28:13 PST 2005

Libor Michalek wrote:

>>I'm not seeing an easy way to handle this condition.  I think that I need to
>>serialize all callbacks for a given cm_id.
> 
>   I would think lack of cm_id serialization would be a bigger problem
> for the CM itself, since it needs to update the cm_id state in a sane
> manner. The consumer could serialize callbacks, I know I do, but it
> would be a bigger problem if the state transitions didn't make sense.
> The older CM used a model where the connection identifier was checked-out
> from the table, and only thread of control had it at a given time. This
> I think would solve a lot of issues.

The state transitions should be handled properly in the CM for 
multithreaded receive handling.  The state transitions are serialized 
and should follow the state diagrams in the spec, and reference 
counting is maintained to block destruction while a callback is 
outstanding.

An example issue that I'm thinking of is a user gets a reply callback. 
  A reject is then received by the CM, and a second callback to the 
user is initiated.  If the user tries to send an RTU, the call will 
fail since the cm_id is in an invalid state.  If the user then returns 
-1 from the callback, the CM will destroy the cm_id.  The destruction 
will block while the reject callback completes.  Since the user 
returned -1 from the reply callback, they may not be ready to handle 
another callback.

The fix that I'm working on should still allow multithreaded operation 
inside the CM, but callbacks to the user will be serialized.  If a user 
returns a non-zero value from a callback, no additional callbacks will 
be generated.

If you see issues with the current state handling in the CM, please let 
me know.

- Sean