[openib-general] CM and REP handling

Sean Hefty mshefty at ichips.intel.com
Fri Jun 30 14:52:07 PDT 2006


Rimmer, Todd wrote:
> I would recommend implementing the state machine as defined in the spec
> for the following reasons:

Technically, I believe that this follows the state machine.  After receiving a 
duplicate REQ, a REP will be resent.  The only difference is that there is a 
delay in resending the REP.

> 1. it will be necessary to pass any future IBTA CIWG compliance tests
> for the CM

I don't believe that a compliance test would detect any issue.

> 2. I would need to think about it, but the lost REP case may not be the
> only situation where a duplicate REQ can be received.

Note that the IB CM handles duplicate REQs differently based on the current state.

> 3. depending on RTU timeout on the passive side as the only means for
> resending the REP reduces the retries attempted in a "lossy" fabric for
> REP and RTU loss (eg. if you have 8 RTU timeout retries on passive side,
> and many REPs are lost followed by many RTUs, you get a total of 8 lost
> REPs+RTUs before you give up, managing the counters separately will tend
> allow for more retries).

The number of retries cannot exceed the maximum CM retries that was specified in 
the REQ.  Resending a REP immediately after receiving a duplicate REQ needs to 
check against this and increment the number of REPs that have been sent.  The 
result is that the connection timeout actually decreases for every duplicate REQ 
that is received.

- Sean




More information about the general mailing list