[ofiwg] [libfabric-users] Two-stage completion

Fri Sep 16 07:53:56 PDT 2016

I'm not sure that using fi_tsendmsg with the FI_INJECT flag would meet Jonathan's requirements - if I'm reading this email chain correctly. I'll reorder his requirements and add comments:

>	4. The application doesn't perform extra copy operations on the message unless it's completely unavoidable.

You also don't want the *provider* to do any internal memory copies. In my mind, using FI_INJECT means "release ownership of the source buffer and return from this function ASAP". Maybe the provider doesn't have hardware support for this and needs to memcpy, or maybe the hardware is busy and the provider needs to memcpy. Also, maybe the provider doesn't support a large enough "inject size" transmit attribute - the application would have to memcpy. Or maybe the provider implements an arbitrarily large "inject size" using and internal memcpy?

>	1. Blocking send semantics over unconnected endpoints

This is a blocking send from the perspective of his wrapper library, right?  The library  implementation would be to spin on fi_cq_read() until a FI_INJECT_COMPLETE event and then return to the application.

>	2. You get to send again as soon as the buffer is safe to write to (currently my use case for the cq),

Same as waiting for FI_INJECT_COMPLETE event.

>	3. You also get some kind of event when we're sure the destination received the event

This is just FI_DELIVERY_COMPLETE|FI_TRANSMIT_COMPELTE

The real question is if two events can be generated for one operation. The documentation for the FI_*_COMPLETE flags in the fi_endpoint man page all start like this:

	"Indicates that a completion should be generated when ..."

"... a completion ..." sounds like there could be more than one completion event. It should say "... the completion ..." if only one event can be generated for each operation.

Earlier, Sean said "The timing difference between the two completion models seems minimal, especially compared to the time needed to generate, transfer, and process an additional ack." But, this is comparing FI_TRANSMIT_COMPLETE and FI_DELIVERY_COMPLETE.  The timing difference is quite noticeable between FI_INJECT_COMPLETE (local operation) and FI_TRANSMIT_COMPLETE|FI_DELIVERY_COMPLETE (remote operation). As Jeff Hammond pointed out, several different middleware could use - needs - this distinction for best performance.

---
ALTERNATIVELY, could Jonathan rely on the FI_ORDER_WAS and issue a local atomic operation (to increment a counter) after the send operation? Or could he rely on FI_ORDER_SAS and issue a "throw away" send using FI_INJECT_COMPLETE in order to get the local completion event (the data movement send would use FI_DELIVERY_COMPLETE)?
---

Sorry for the long email !

Mike

-----Original Message-----
From: Libfabric-users [mailto:libfabric-users-bounces at lists.openfabrics.org] On Behalf Of Hefty, Sean
Sent: Thursday, September 15, 2016 5:36 PM
To: Smith, Jonathan D <jonathan.d.smith at intel.com>; Jeff Hammond <jeff.science at gmail.com>
Cc: ofiwg at lists.openfabrics.org; libfabric-users at lists.openfabrics.org
Subject: Re: [libfabric-users] Two-stage completion

> If I start using the inject call to block until the buffer is safe, how
> do I get the kind of completion I'd need for my timeout, if there's no
> flags argument in the fi_tinject call?

To reuse the buffer immediately but still get a completion, you should call fi_tsendmsg with the FI_INJECT flag.  The default completion mode will then be used.
_______________________________________________
Libfabric-users mailing list
Libfabric-users at lists.openfabrics.org
http://lists.openfabrics.org/mailman/listinfo/libfabric-users