[openib-general] RFC on CM error handling
Sean Hefty
mshefty at ichips.intel.com
Fri Jan 21 11:28:13 PST 2005
Libor Michalek wrote:
>>I'm not seeing an easy way to handle this condition. I think that I need to
>>serialize all callbacks for a given cm_id.
>
> I would think lack of cm_id serialization would be a bigger problem
> for the CM itself, since it needs to update the cm_id state in a sane
> manner. The consumer could serialize callbacks, I know I do, but it
> would be a bigger problem if the state transitions didn't make sense.
> The older CM used a model where the connection identifier was checked-out
> from the table, and only thread of control had it at a given time. This
> I think would solve a lot of issues.
The state transitions should be handled properly in the CM for
multithreaded receive handling. The state transitions are serialized
and should follow the state diagrams in the spec, and reference
counting is maintained to block destruction while a callback is
outstanding.
An example issue that I'm thinking of is a user gets a reply callback.
A reject is then received by the CM, and a second callback to the
user is initiated. If the user tries to send an RTU, the call will
fail since the cm_id is in an invalid state. If the user then returns
-1 from the callback, the CM will destroy the cm_id. The destruction
will block while the reject callback completes. Since the user
returned -1 from the reply callback, they may not be ready to handle
another callback.
The fix that I'm working on should still allow multithreaded operation
inside the CM, but callbacks to the user will be serialized. If a user
returns a non-zero value from a callback, no additional callbacks will
be generated.
If you see issues with the current state handling in the CM, please let
me know.
- Sean
More information about the general
mailing list