[libfabric-users] Two-stage completion

Wed Sep 14 17:08:20 PDT 2016

On Sep 14, 2016, at 4:59 PM, Hefty, Sean <sean.hefty at intel.com> wrote:

>> In my application, I could benefit greatly if I could force generation
>> of two completion events per fi_tsendmsg – one generated on
>> FI_TRANSMIT_COMPLETE and the other on FI_DELIVERY_COMPLETE, so I can
>> block until the buffer is safe to mangle (post-transmit), and also
>> throw an event in my app if a timeout occurs before the delivery
>> completion.
> 
> Libfabric does not define a mechanism to generate multiple completion events for the same message.  Asynchronous completions are hard enough for most users to deal with.  Making them deal with progress notifications on the way to the operation really completing seems unfriendly.
> 

Local and remote completion events are not complicated. I have always generated both when possible, and I am a pretty dumb user of low-level networking software. It's hard to imagine folks dumber than me using OFI.

MPI-3 explicitly uses both completions in the definition of MPI_Win_flush and MPI_Win_flush_local. OpenSHMEM does the same with blocking put and quiet.  UPC needs both, and I suspect most other PGAS models do as well. 

> I would need to understand the benefit here.  The timing difference between the two completion models seems minimal, especially compared to the time needed to generate, transfer, and process an additional ack.
> 

What if your provider copies to local buffer, then transmits from there? Local completion happens at the rate of memcpy, which is usually faster than RDMA. 

If you have link level reliability like Blue Gene, local and remote completion times can differ by a noticeable amount, especially at scale and under contention.

Jeff 

> - Sean
> _______________________________________________
> Libfabric-users mailing list
> Libfabric-users at lists.openfabrics.org
> http://lists.openfabrics.org/mailman/listinfo/libfabric-users