[ofiwg] respecting message boundaries in recv

Fri Aug 7 10:04:13 PDT 2015

> I'm implementing a libfabric provider and I'm unsure about the
> implications of the statement 'Message boundaries are maintained' in

This statement is in contrast to a data stream (i.e. TCP).  Each call to send on the transmit side will result in a single message being transmitted and received. 

> fi_recv* operations. In particular, what should happen if a receive
> buffer posted with fi_recv (the same applies to fi_recvv or fi_recvmsg)
> is smaller than the size of the message to be received?

This ends up being provider specific, and may depend on the endpoint type and other attributes.

Note that there is an open github issue to address this in more detail, so I would expect that something to help define this will eventually make it into the API.

> I see some options:
> 1. Nothing is received, and an error is inserted into the completion
> queue (FI_EMSGSIZE seems to be the right error code, with the len field
> containing the message length).
> 2. Posted buffer is filled, and a completion is generated. If completion
> format is at least FI_CQ_FORMAT_MSG, the len field of the completion is
> set to the size of the message to be received (which means bigger than
> the buffer size; this way the application is informed that the message
> is bigger than expected)
> 3. Posted buffer is filled, and a completion is generated. If completion
> format is at least FI_CQ_FORMAT_MSG, the len field of the completion is
> set to the size of the buffer. The application must have its own way to
> know that the message is not completely received.
> 3. Posted buffer is filled, but no completion is generated until a new
> buffer is posted which is big enough to contain the remainder of the
> message (this seems quite bad, since I don't see a practical way for the
> application to know when this happens).

The short answer is that this is an error.

I would separate out the options based on endpoint type, whether FI_MULTI_RECV was supported, and whether or not flow control is enabled.  Based on the requirements that were gathered in the design of libfabric, apps are wanting both flow control and FI_MULTI_RECV.  Hopefully providers move toward providing both, even if additional software support is required.

For DGRAM endpoints, the best options are either for the provider to drop the message or report a truncated completion.  I'm guessing that dropping the message would be the most likely option chosen by providers for 'msg' operations, but truncation for 'tagged' operations.

For MSG endpoints, this is fatal to existing HW.  The message will be dropped, and the endpoint will be moved into an error state.  This assumes that there's not a higher-level software transport running over the HW.

For RDM endpoints, truncation seems the best option.

If a message is truncated a completion error is generated (fi_cq_err_entry).  The len reported is the size of the received data.  The olen is the amount of data that was discarded.

> Really sorry of bothering you if the answer is in the manpages, but I
> was unable to spot it. Also, I hope it is fine I posted here despite not
> being a libfabric developer myself.

Posting here is fine and provides good feedback regarding the state of the man pages.

- Sean