[ofiwg] Proposal for enhancement to support additional Persistent Memory use cases (ofiwg/libfabric#5874)

Mon May 4 08:59:47 PDT 2020

> I understand.  But I don’t think we should move forward with any pmem additions until
> we at least talk about it.  Have the IBTA and IETF drafts been taken into account in
> whats being proposed?

Yes.  The status of the drafts are the reason the discussion is being restarted.  And I really don't understand why there's an argument that we need to have discussions on the thread that was specifically started for discussion purposes.

James opened a github issue to track this.  He posted his comments to the email list to increase their visibility.  There are no active patches against libfabric, but it is much easier to have discussions drive to something tangible when there are concrete ideas being presented.  James (guessing, with others) have spent some time considering this.

Along this line, I added an alternative proposal to github on Friday, which is copied below.

- Sean

---

I'm proposing the following API changes to expand persistent memory support.  This is in addition to the existing API definitions.

```
#define FI_SAVE    (1ULL << 32)
```

We can work on the name, but this is basically it.  :)

This is an operational flag that can be passed into fi_writemsg.  When specified, it indicates that the target memory region(s) should be updated to reflect all prior data transfers, such that they have the same completion semantic as the save operation.

E.g. fi_writemsg(..., FI_SAVE | FI_COMMIT_COMPLETE) to a persistent memory region behaves the same as a lower-level flush operation.

An FI_SAVE operation does not transfer data to the target region.  It acts as a limited fencing operation for operations of the same type to the same region.  E.g. a save write command does not complete until all previous writes to the same region have completed.

CQ data may be included as part of the operation.  Data and message ordering is unchanged.

The flag can be added to fi_readmsg using a similar approach, but that can be deferred.

Likewise, we can extend this to fi_atomicmsg and fi_fetch_atomicmsg by defining FI_ATOMIC_PMEM.  When used with an atomic operation, the data is 'saved' atomically using the data type specified with the save command.  Updates to the data still use the data type specified in the previous atomic calls.  This too can be deferred.

The flag can also apply to msg and tagged operations by defining that all prior messages to the specified peer reach the same completion semantic.