[openib-general] RFC on CM error handling
Sean Hefty
mshefty at ichips.intel.com
Fri Jan 21 11:55:25 PST 2005
Libor Michalek wrote:
>>An example issue that I'm thinking of is a user gets a reply callback.
>> A reject is then received by the CM, and a second callback to the
>>user is initiated. If the user tries to send an RTU, the call will
>>fail since the cm_id is in an invalid state. If the user then returns
>>-1 from the callback, the CM will destroy the cm_id. The destruction
>>will block while the reject callback completes. Since the user
>>returned -1 from the reply callback, they may not be ready to handle
>>another callback.
>>
>>The fix that I'm working on should still allow multithreaded operation
>>inside the CM, but callbacks to the user will be serialized. If a user
>>returns a non-zero value from a callback, no additional callbacks will
>>be generated.
>
>
> OK, that's the behaviour I would expect. However, in the example, even
> if the user returns 0 from the REP callback, I wouldn't expect to see
> the REJ after the REP has been processed. (or after the RTU has been sent)
> The CM states updates for a connection and resulting callbacks would be
> serialized, so the REJ after the REP would be discarded since it was
> received in a CM state which does not allow rejects. Or is this incorrect?
From the REP callback, even if the call to send an RTU is successful,
a REJ could still be received. (The remote side timed out waiting for
the RTU.) Locally, the cm_id state went from REP_RCVD to ESTABLISHED
to TIMEWAIT. Given this, it seems that there are missing state
transitions in the spec handling a REJ from REP_RCVD or MRA_REP_SENT
states, which would drive the state back to IDLE.
- Sean
More information about the general
mailing list