[ofiwg] completion flags as actually defined by OFI

Weiny, Ira ira.weiny at intel.com
Tue Apr 14 17:35:32 PDT 2015

> In this case, yes, it is valuable, and the semantic is obvious:
> with the additional guarentee that if the peer stored the message into non-
> volatile memory, then that memory will retain the whole message uncorrupted
> across an peer local failure. NVM will survive a peer local failure including os
> failure and loss of power, shared memory (anon or file backed) will survive a
> peer local failure including process kill.
> To me, the only way the above makes sense, is with dedicated hardware
> support. Which doesn't exist today. 

Even then I think the provider can return the FI_COMMIT_COMPLETE which indicates an application level of completion ...  Only the application knows that the memory is NVM and that storing in NVM is what is important.  Libfabric can't make that determination on its own.

> Until it does, the CPU is involved and you
> are better to use FI_COMMIT_COMPLETE and have the app signal commit
> once it has done whatever sync is needed for the memory type it is working
> with. That is clearly more flexable and gets libfabric out of the messy business
> of WTF does 'persistent' mean for memory.

Exactly, it is not libfabrics place to define "persistent" or whatever.  Better to define generic completion events which make sense for the fabric problem space.
> So, that's my rational for picking these points and not others.

Fair enough.  I was just thinking that a completion which indicates that libfabric has performed the transfer requested regardless of application failures would be useful.  That in my mind is the definition of a "reliable" fabric completion.  Perhaps without application involvement this type of completion is useless?

> > FWIW, I'm not sure if this is a good idea or not.  IMO it muddies the
> > water between libfabric and the application.
> There may be cases where libfabric can piggyback the application ack on its
> own low level messaging and gain efficiency.

Yes efficiency would be a reason to do this...  and sometimes efficiency requires a bit of mud...  <shrug>  :-/


More information about the ofiwg mailing list