[ofiwg] [libfabric-users] Two-stage completion

Thu Sep 15 15:02:36 PDT 2016

Hi Sean!

Thanks for your help so far. I think I'm getting somewhere! I have some responses and more details for you:

> those flags only apply to the fi_sendmsg call.  Other send operations do not take flags, and they do not apply to the fi_recvmsg call.

Understood. I'm using connectionless OFI with tags, so the send call I've been using is fi_tsendmsg. I'm using the flags to specify what kind of completion I want.

> Libfabric does not define blocking operations.  All operations are asynchronous.

Ah, I understand that, but I can see how what I said would be ambiguous.

It might help if I explain that I'm providing a wrapper layer on OFI. It must manage multiple fabric providers in parallel, and provide an interface that includes a send function that could, in the face of a network failure, use alternative providers. I would like *that* send function to be a blocking call, which I have been implementing in prototype using the completion queue events.

I would also like to support message timeout so I know to try to send the message with a different provider. 

This is what led me to the idea that I needed two completion events.

> You need to clarify what it means for the destination to receive the message.  Is the destination the peer process?  Peer node?  Peer NIC?

The aforementioned timeout is the reason I need some kind of ACK. I need verification that the peer process, if it were to post a receive buffer, would get the message (my understanding is that if I post a send, and then much later the peer posts a recv buffer, that's ok, since the providers have some notion of a queue of receives that haven't been given to the process yet). This may mean that the message got to the peer NIC, but I'm not sure.

>The inject and tagged message calls will work with unconnected (RDM - reliable datagram) endpoints.

Great! That helps a lot. It might be wise to reword the man page section on the inject and send calls, because send claims it only works with connected endpoints, and inject claims it's an optimized version of send.

I'll end on a single question:
If I start using the inject call to block until the buffer is safe, how do I get the kind of completion I'd need for my timeout, if there's no flags argument in the fi_tinject call?

Thanks!
Jonathan

-----Original Message-----
From: Hefty, Sean 
Sent: Thursday, September 15, 2016 1:18 PM
To: Smith, Jonathan D <jonathan.d.smith at intel.com>; Jeff Hammond <jeff.science at gmail.com>
Cc: libfabric-users at lists.openfabrics.org; ofiwg at lists.openfabrics.org
Subject: RE: [libfabric-users] Two-stage completion

> ...oh? I thought that FI_TRANSMIT_COMPLETE was the local completion 
> and FI_DELIVERY_COMPLETE was the remote completion. What does this 
> mean, then?
> 
> FI_TRANSMIT_COMPLETE
> 	Applies to fi_sendmsg. Indicates that a completion should not be 
> generated until the operation has been successfully transmitted and is 
> no longer being tracked by the provider.
> FI_DELIVERY_COMPLETE
> 	Applies to fi_sendmsg. Indicates that a completion should be 
> generated when the operation has been processed by the destination.

You looking at the flags discussion for the send/receive operations.  This is calling out that those flags only apply to the fi_sendmsg call.  Other send operations do no take flags, and they do not apply to the fi_recvmsg call.

These flags can set the completion model for a specific send operation.  The documentation here does not try to re-state the full meaning of the completion mode, however.

> Anyway, I realize that I'm trying to have my cake and eat it too, but 
> in general, I'm looking for:
> 	1. Blocking send semantics over unconnected endpoints

Libfabric does not define blocking operations.  All operations are asynchronous.

> 	2. You get to send again as soon as the buffer is safe to write to 
> (currently my use case for the cq),

The buffer may be re-used either after you get a completion or immediately if an inject call is used.

> 	3. You also get some kind of event when we're sure the destination 
> received the event,

You need to clarify what it means for the destination to receive the message.  Is the destination the peer process?  Peer node?  Peer NIC?

> 	4. The application doesn't perform extra copy operations on the 
> message unless it's completely unavoidable.

Well, what exactly is an 'extra copy'?  :)

The API does not dictate how a provider implements the various completion semantics.  Providers may copy the message buffers into outbound message queues, some internal buffer, or whatnot.  You just hope that the provider selects the best performing option.

> This doesn't mean "return from the send call with the buffer safe 
> ASAP," as in that case I would just use the memcpy strategy.
> 
> So is fi_tinject with FI_INJECT_COMPLETE what I want? It seems that 
> that's probably the case, but I do need unconnected endpoints. Since 
> the man pages say that it's an "optimized version of fi_tsend" it 
> leads me to believe I cannot use it without establishing a connection.

The inject and tagged message calls will work with unconnected (RDM - reliable datagram) endpoints.