[ofiwg] completion flags as actually defined by OFI
Bernard Metzler
BMT at zurich.ibm.com
Tue Apr 14 14:37:43 PDT 2015
"Hefty, Sean" <sean.hefty at intel.com> wrote on 04/14/2015 10:35:48 PM:
> From: "Hefty, Sean" <sean.hefty at intel.com>
> To: Bernard Metzler <BMT at zurich.ibm.com>
> Cc: Jason Gunthorpe <jgunthorpe at obsidianresearch.com>,
> "ofiwg at lists.openfabrics.org" <ofiwg at lists.openfabrics.org>
> Date: 04/14/2015 10:35 PM
> Subject: RE: [ofiwg] completion flags as actually defined by OFI
>
> > > - The sockets and psm providers do not generate a completion until
> > > the remote side has processed the request and acknowledged the data.
> > > - Cisco needs to confirm the usnic provider behavior, but it's UD
> > > anyway. I believe it adheres to the description given for
> > > completions on unreliable endpoints.
> > > - Verbs does not generate a completion until the data has been
> > > acked by the remote side, unless I'm remembering it wrong.
> > >
> >
> > This is RDMA transport dependent. It might be true for IB but is
> > definitively not for iWarp.
>
> iWarp states:
>
> "DDP Message transfer is considered completed when the reliable, in-
> order transport LLP has indicated that the transfer will occur reliably."
>
You skipped the important second half of that paragraph of RFC5041:
At the Data Source, DDP Message transfer is considered completed when
the reliable, in-order transport LLP has indicated that the transfer
will occur reliably. Note that this in no way restricts the LLP from
buffering the data at either the Data Source or Data Sink. Thus, at
the Data Source, completion of a DDP Message does not necessarily
mean that the Data Sink has received the message.
Reliability is a property of the LLP, but that does not translate into
guarantee of delivery.
Think of a TCP socket - it may buffer transmit data (after copying it
for potential retransmission) and immediately give control over the send
buffer back to the caller. At that point in time, the data may not have
hit the wire for the first time. iWARP is specified in a way which
explicitly allows split-stack implementations where the iWARP part of
it has only very limited LLP state information.
Using the completion semantic terms we discussed at todays call, iWARP
MUST support type 1 only. It MAY support stronger guarantees, but
the current verbs API does not allow any signalization of that. To deal
with that issue, a typical RDMA transport aware application detecting
iWARP will bundle the (then unsignalled) SEND or WRITE with a signalled
zero length READ.
> This is why the man pages intentionally try to define the behavior
> as viewed by the application, and does not define a requirement on
> the provider or any part of the implementation.
>
I would rather like to see the API to be more verbose. That does not
impose requirements on the provider.
More information about the ofiwg
mailing list