[openib-general] design for communication established affiliated asynchronous event handling

Fri Jun 16 13:02:47 PDT 2006

> -----Original Message-----
> From: Or Gerlitz; openib-general
> > In most cases, I would expect that the IB CM will eventually receive
the
> RTU,
> > which will generate an event to the RDMA CM to transition the QP
into
> RTS.
> 
> But we want an IB stack and set of ULPs which would work in production
so
> they
> need to handle also irregular cases... eg when the RTU is lost over
and
> over.

Agreed.  The missing RTU case must be handled for a few reasons:
1. The RTU could honestly be lost (GSI QPs are UD, they could overflow,
fabric could loose the packet, etc)
2. The RC send could beat the processing of the RTU (packets on wire may
be out of order if there are different SLs/VLs involved with GSI vs
application QP).  Also its possible the CM is slower getting to its
queue of packets (such as when bombarded by many connections) while
application/ULP gets its RC send quickly. [I have observed this
situation in various real world stress tests].

This problem is quite simple to handle (I did it a few years ago in the
SilverStorm stack) and the IB spec completely covers this issue:

CM - have a hook so the CM can get the Async Events for all CAs.  On
getting the Async Event for packet first packet received while in RTR
(Communication established), the CM should treat this exactly like an
RTU (with no private data).  The CM will need to cross reference the
CA/QP this event was reported for to identify the applicable connection
endpoint.  If you check the IBTA spec and the CM state machines you will
see the CM is supposed to handle this event.  Also if the RTU does
arrive later, the CM state machine also handles that correctly by
discarding the RTU as if it was a duplicate.  Note: this is why
applications should not depend on private data in the RTU.

ULPs - all ULPs should be written so they are fully ready to process
inbound data before they tell the CM to send the REP.  It is very likely
the ULP will get a CQ completion for the inbound RQ data before the CM
has completed its processing.  In general IB allows for this situation
quite nicely.  The ULP can process the inbound data normally and queue
it to the Send Q.  Putting data on a Send Q is permitted in RTR, but the
QP will not initiate sending until moved to RTS.  As such the ULP can
allow the Cm RTU processing (which will race with the RQ data
completion) do its normal thing and move the QP to RTS.

Todd Rimmer