[openib-general] RFC on CM error handling

Fri Jan 21 11:55:25 PST 2005

Libor Michalek wrote:
>>An example issue that I'm thinking of is a user gets a reply callback. 
>>  A reject is then received by the CM, and a second callback to the 
>>user is initiated.  If the user tries to send an RTU, the call will 
>>fail since the cm_id is in an invalid state.  If the user then returns 
>>-1 from the callback, the CM will destroy the cm_id.  The destruction 
>>will block while the reject callback completes.  Since the user 
>>returned -1 from the reply callback, they may not be ready to handle 
>>another callback.
>>
>>The fix that I'm working on should still allow multithreaded operation 
>>inside the CM, but callbacks to the user will be serialized.  If a user 
>>returns a non-zero value from a callback, no additional callbacks will 
>>be generated.
> 
> 
>   OK, that's the behaviour I would expect. However, in the example, even
> if the user returns 0 from the REP callback, I wouldn't expect to see
> the REJ after the REP has been processed. (or after the RTU has been sent)
> The CM states updates for a connection and resulting callbacks would be
> serialized, so the REJ after the REP would be discarded since it was
> received in a CM state which does not allow rejects. Or is this incorrect?

 From the REP callback, even if the call to send an RTU is successful, 
a REJ could still be received.  (The remote side timed out waiting for 
the RTU.)  Locally, the cm_id state went from REP_RCVD to ESTABLISHED 
to TIMEWAIT.  Given this, it seems that there are missing state 
transitions in the spec handling a REJ from REP_RCVD or MRA_REP_SENT 
states, which would drive the state back to IDLE.

- Sean