[openib-general] [PATCH 0/4] Dispatch communication relatedevents to the IB CM

Rimmer, Todd trimmer at silverstorm.com
Thu Jul 27 08:09:25 PDT 2006


> From: Or Gerlitz [mailto:ogerlitz at voltaire.com]
> 
> Rimmer, Todd wrote:
> > This approach will not work.  If the QP is in RTS the Communication
> > established event will never be generated.  Hence the lost RTU case
> > would not be properly handled and the ULP would need to take on the
> > burden.  Its much better to isolate the solution to the CM and let
the
> > ULP post to the send Q in RTR.
> 
> I might miss you allover also is there a chance you might not read the
> patches with  enough attention?
> 
> Lets first agree that you don't refer to CMA consumers for which the
CMA
> does the state transitions, since for them the CM will always get the
> COMM_EST async event and will emulate an RTU reception, that is will
> transition the cm id state and generate CM_USER_ESTABLISHED event for
> the CMA which will modify the qp state to RTS and generate
> RDMA_ESTABLISHED event to the ULP.
> 
> So might mean to other types of CM/CMA consumers, please provide the
> details, specifically what makes you state "if the QP is in RTS".
> 
> Or.

My comment was in response to Sean's comment:
> I think it would be simpler to transition the QP to RTS after sending
a 
> REP, with the restriction that a user may not post sends until an RTU
is 
> received, a communication establish event occurs, or a receive message

> completes on the QP.

Hence, this was not in the patches, it was something he was proposing as
an alternative.  My point is that if the CMA moved the QP to RTS, the
CMA would not get a HCA Communication Established Async Event, in which
case the CMA would have no vehicle to generate the communication
established event to the CMA consumers.

It seems burdensome for all CMA consumers to need to implement an
alternate Tx queue which will only be used for this one rare situation.
The result would be that few CMA consumers would implement it and it
would be difficult to test.  Hence it is best for the CMA and stack to
address the race itself.

The particular rare race is the case where:
active side CMA consumer completes connection process (and CMA sends
RTU).
active side immediately sends a message
passive side CQ callback occurs before CMA gets RTU or Communication
Established Async event (and hence before CMA has moved QP to RTS)

While this race sounds rare, it's the kind of thing which will happen in
some large cluster under heavy stress.  In which case it will be hard to
debug, so its better to design out the race from the start.

In this rare case, the passive side needs to queue any response TX it
may want to do until it gets Communication established.  This sidebar
queue would not be required after communication established callback.
However to avoid CMA consumer protocol errors, the CMA consumer would
have to make sure the messages on this TX queue were unconditionally
sent before any future sends.

As it turns out, we already have such a Q, the Send Q.  The Send Q was
created previously and the only true limitation is that per IBTA the HCA
hardware may not accept send doorbells until in RTS.  So one possible
approach internal to the stack would be to allow CMA consumers to post
to the Send Q when the QP is in RTR, however internal to the stack do
not inform the HCA QP of these WQEs until the QP is moved to RTS.  The
HCA driver could keep track of how many Send Q posts occurred while in
RTR, then upon movement to RTS, it could issue the appropriate doorbells
to the hardware.

The above approach would solve the race completely transparent to CMA
consumers.

Todd Rimmer




More information about the general mailing list