[ofiwg] EP_RDM question

Thu Jun 30 10:24:43 PDT 2016

> EP_RDM is described thusly in fi_endpoint(3):
> 
> -----
> FI_EP_RDM
> Reliable datagram message. Provides a reliable, unconnected data
> transfer service with flow control that maintains message boundaries.
> -----
> 
> Consider this scenario:
> 
> 1. sender sends message A on EP_RDM endpoint at time X
> 2. when message A arrives at the target, there are no receive buffers
> posted
> 
> What happens?

Provider specific  :)

> 1. The lack of receive buffers at the target should trigger an error at
> the sender indicating that message A was not delivered.
> 
> 2. Or:
> 
> 2a. If receive buffers are eventually posted at the target, message A
> will be delivered successfully.
> 2b. If the target endpoint is closed before receive buffers are
> available at the target for message A, an error is triggered at the
> sender indicating that message A was not delivered
> 2c. If message A has not been delivered within a given timeout (for any
> reason -- to include lack of buffers at the target), an error is
> triggered at the sender indicating that message A was not delivered
> 
> In short: assuming a receiver a) continually posts receive buffers, and
> b) doesn't close its endpoint, do senders need to worry about credits
> with RDM endpoints?

IMO, the ideal case is option 2.  This relates to the resource management enabled attribute.  With RM disabled, option 1 is a valid way to handle the issue.  With RM enabled, I would expect the provider to implement something closer to option 2.

> This question obviously only makes sense when only a single sender is
> sending to a RDM target endpoint.
> 
> But I also ask because a popular technique for IB SRQ with multiple
> senders is to set the sender retry to an infinite value. I.e., since
> there are multiple senders, credit schemes don't help, and the most
> hardware-offload-ish scheme is to just have the hardware keep re-
> sending until a message gets through.  Does RDM offer that capability?

RM enabled is similar, but defined in broader terms than just posted receive buffers.  For example, with RM enabled, the provider is supposed to prevent CQ overflow from losing the message as well.