[ofiwg] A question on FI_DELIVERY_COMPLETE

Mon Oct 26 16:40:44 PDT 2015

> FI_DELIVERY_COMPLETE is intended only to apply to the initiator of an
> operation.
> [PG] I suspected as much.
> 
> The generation of a notification at the target is assumed to occur after
> the operation has completed -- i.e. any transferred data is available.
> This holds whether the completion is an entry placed into a CQ, or a
> completion counter has been incremented.
> 
> [PG] Using one of today's well-known networks as an example, there is no
> way to guarantee that data is actually visible to the responder before
> posting a completion to the CQ.  I assume that for backward compatibility
> reasons we would want to maintain that behavior.  That implies that it
> would be desirable to define some other behavior in which the responder
> side provider, through some internal mechanism, guarantees that the data
> is visible to the consumer before signaling the completion to the consumer
> (whether it is via a completion event or a counter increment).  It would
> be the moral equivalent of FI_DELIVERY_COMPLETE, but on the responder
> side.

Which CQ are you referring to?  If a completion is written at the target, then the data associated with it better be visible to the target process.  Otherwise the completion is meaningless.  Consider a CQ entry for a received message. 

>  The FI_REMOTE_CQ_DATA flag is somewhat independent of this.  That flag
> just means that application data was written into a CQ entry.
> [PG] I don't quite understand this.  As I read it, remote cq data is the
> moral equivalent of immediate data, and FI_REMOTE_CQ_DATA is the mechanism
> that causes the requester to send immediate data.  The presence of this
> Remote CQ Data, in turn, causes a completion event on the remote side,
> which might be either an event posted to the completion queue, or the
> increment of a counter. (In the case of IB, it also causes the consumption
> of a RECV WQE, but that isn't the case with libfabric.)

Nit: FI_REMOTE_CQ_DATA only applies to CQ entries, not counters.

With libfabric, the use of FI_REMOTE_CQ_DATA is not required in order to generate a completion at the target.  E.g. a RMA write operation can increment a completion counter or generate a CQ entry at the target without remote CQ data present.  Similarly, an RMA read operation can increment a completion counter or generate a CQ entry.  *If* FI_REMOTE_CQ_DATA is present, then a CQ entry will always be generated at the target for a successful operation.  This is the behavior that applications requested.

> Although IB cannot generate target notification without FI_REMOTE_CQ_DATA
> (i.e. immediate data), libfabric does not require this.
> [PG] Other than a subsequent send message, how can libfabric generate a
> notification on the target side other than using FI_REMOTE_CQ_DATA?

This is provider specific.  But there's nothing special about generating a CQ entry.  I can't think of any reason why IB hardware couldn't easily be adapted to generate a CQ entry in response to receiving an RMA write operation (without immediate data), for example, other than the spec doesn't define it.

> The generation of a completion entry at the target is independent of the
> completion mode selected by the initiator.
> [PG] Agreed.  I am suggesting a mechanism that controls the generation of
> a completion entry at the responder side.

The target side controls whether a completion entry is generated through the use of completion flags (e.g. FI_REMOTE_WRITE, FI_REMOTE_READ) when binding a CQ or counter to an endpoint.  There are not the same level of 'completion modes' on the target side as there are at the initiator.  A completion entry indicates that the operation is done.  There is no notification for operations that are in progress.

- Sean