[ofiwg] Proposal for enhancement to support additional Persistent Memory use cases (ofiwg/libfabric#5874)

Hefty, Sean sean.hefty at intel.com
Tue May 5 12:50:03 PDT 2020

> I believe the current proposals do take into account the IBTA/IETF drafts.  But in any
> case, none of this is going anywhere yet until it's been thoroughly discussed and
> reviewed, beginning at tomorrow's OFIWG meeting.  As of now, you are first on
> tomorrow's agenda.

Based on the ofiwg call today, I added to github a list of the low-level features that applications should be able to access, and how well the current API maps to those features.  This is copied below.

If we can ensure that we have the right feature list, and agree on the current status of how those features are or are not being met, it should make it easier to design the right solution to cover the gaps.

- Sean


There are 4 main lower-level functions that need to be mapped to:

1. **8-byte atomic write ordered with RDMA writes**
OFI defines a more generic atomic write.  Message ordering is controlled through fi_tx_attr::msg_order flags.  Data ordering is controlled through fi_ep_attr::max_order_waw_size.  The existing API should be sufficient.

2. **flush data for persistency**
The low-level flush operation ensures previous RDMA and atomic write operations to a given target region are persistent prior to completing.  The target region may be accessible through multiple endpoints and NIC ports.  Also, low-level transports require write after write message and data ordering, which is assumed by the flush operation.
OFI defines FI_COMMIT_COMPLETE for persistent completion semantics.  This provides limited support, handling only the following mapping: RMA write followed by a matching flush.  A more generic mechanism needs to be defined, which would allow for a less strict completion on the RMA writes, with the persistent command following.  This is possible today through the FI_FENCE flag, but that could result in stalls in the messaging.

3. **flush data for global visibility**
This is similar to 2, with application and fabric visibility replacing persistency.
OFI defines FI_DELIVERY_COMPLETE as a visibility completion semantic.  This has similar limits as mentioned above.

4. **Data verify**
There is no equivalent existing functionality, but it is aligned with discussions around SmartNIC and FPGA support, which defines generic offload functionality.

More information about the ofiwg mailing list