[ofiwg] Proposal for enhancement to support additional Persistent Memory use cases (ofiwg/libfabric#5874)
Grun, Paul
paul.grun at hpe.com
Fri May 1 11:13:54 PDT 2020
Keep in mind that the libfabric API doesn't necessarily directly mimic what is implemented in verbs. The requirement is that the verbs semantics are implementable via a libfabric implementation. I think of libfabric as being a slightly more abstract interface than verbs, hence the libfabric APIs don't necessarily expose the gritty details described in the current IBTA Annex.
> -----Original Message-----
> From: ofiwg <ofiwg-bounces at lists.openfabrics.org> On Behalf Of Douglas,
> Chet R
> Sent: Friday, May 1, 2020 7:51 AM
> To: Rupert Dance - SFI <rsdance at soft-forge.com>; Swaro, James E
> <james.swaro at hpe.com>; Hefty, Sean <sean.hefty at intel.com>;
> ofiwg at lists.openfabrics.org
> Subject: Re: [ofiwg] Proposal for enhancement to support additional Persistent
> Memory use cases (ofiwg/libfabric#5874)
>
> It matters! Eventually (now?) we want full RDMA extension support in libfabrics,
> libibverbs, and the verbs spec. This appears to be based on Intel's original
> libfabric proposal? Commit is not a valid term. Complete RDMA memory
> placement extension support looks different than it did in that original proposal.
> We need to architect the complete solution. Don’t we? Does it support RDMA
> Flush, Write Atomic and Verify? How do you register cached vs uncached
> pmem? Is this already in the wild? If not we shouldn’t release it without further
> consideration.
>
> -----Original Message-----
> From: ofiwg <ofiwg-bounces at lists.openfabrics.org> On Behalf Of Rupert Dance
> - SFI
> Sent: Friday, May 01, 2020 8:31 AM
> To: 'Swaro, James E' <james.swaro at hpe.com>; Hefty, Sean
> <sean.hefty at intel.com>; ofiwg at lists.openfabrics.org
> Subject: Re: [ofiwg] Proposal for enhancement to support additional Persistent
> Memory use cases (ofiwg/libfabric#5874)
>
> Is this team aware of what the IBTA is doing with PME or does it not matter
> since it is libfabrics?
>
> -----Original Message-----
> From: ofiwg <ofiwg-bounces at lists.openfabrics.org> On Behalf Of Swaro, James
> E
> Sent: Friday, May 01, 2020 9:41 AM
> To: Hefty, Sean <sean.hefty at intel.com>; ofiwg at lists.openfabrics.org
> Subject: Re: [ofiwg] Proposal for enhancement to support additional Persistent
> Memory use cases (ofiwg/libfabric#5874)
>
> > > * This allows a memory region to be registered as being capable
> of
> > > persistence. This has already been introduced into the upstream libfabric
> GITHUB, but
> > > should be reviewed to ensure it matches use case requirements.
>
> > FI_RMA_PMEM is defined as a MR flag. Note that this definition
> intentionally limits non-RMA transfers from taking advantage of persistent
> memory semantics.
>
> > The intent of this flag is to give providers implementation flexibility,
> specifically based on hardware/software differences.
>
> Understood. The intent of this section of the proposal was to outline potential
> areas for change. Any questions posed here were historical and meant to
> provoke discussion. They might even be a little dated. Those changes and the
> rationale are discussed below.
>
>
> > > every operation. That type of operation would make the delivery of
> completion events
> > > take longer than necessary for most operations, so SHMEM would need
> finer control over
> > > commit flushing behavior.
>
> > OFI does not require that an event be generated for every transfer. It also
> allows transfers to report completions using 'lower' completion semantics, such
> as FI_TRANSMIT_COMPLETE. Completion events at the target of an RMA write
> requires the FI_RMA_EVENT capability, and is independent from PMEM.
>
> Understood. This paragraph was intended to address a complication that was
> raised in one of the meetings.
>
> It was discussed that with some applications, all or most data would be required
> to be persistent. The solution at the time was to provide
> FI_COMMIT_COMPLETE as part of the default TX op_flags at the time, which
> would incur a higher cost to provide that level of completion. The goal with this
> proposal would be to allow upper layers to set a less strict completion model,
> such as delivery or transmit complete as part of the default op_flag, or per-
> operation flag and address persistence as a batch operation via the fi_commit
> API.
>
>
> > > * A single request to fi_commit should generate a control
> message to target
> > > hardware or software emulation environment to flush the contents of
> memory targets.
>
> > This needs to be defined in terms of application level semantics, not
> implementation details. fi_commit could be a no-op based on the provider
> implementation. (It actually would be for the socket and tcp providers, which
> act at the target based on the MR flag.)
>
> Completely agree. Rereading this proposal, I meant to change some of these
> discussion points away from implementation to a discussion on behavior and
> semantics. How fi_commit behaves w.r.t implementation specifics isn't within
> the scope of this proposal. Implementation details are something I'd prefer to
> stay away from so we can define how we expect it to behave.
>
> > > flexibility in the API design to future proof against options we might not
> conceive of
> > > until after the prototype is complete, and the context available for the user
> and
> > > returned with the completion
>
> > The proposed definition is limited to RMA (and atomic) writes. There is no
> mechanism for handling RMA reads into persistent memory, for example. That
> should be included. Message transfers may need a separate mechanism for this.
> That can be deferred (left undefined by the man pages), but should ideally we
> should have an idea for how to support it.
>
> > The best existing API definition for an fi_commit call would be the
> fi_readmsg/fi_writemsg() calls. We could even re-use those calls by adding a
> flag.
>
> The proposed definition is limited to RMA and AMO because we didn't have a
> strong use case for messaging, but I'd like to go the route that allows messaging
> to be easily included if that changes later down the road.
>
>
> > > * Since this API behaves like a data transfer API, it is expected that
> this
> > > API would generate a completion event to the local completion queue
> associated with the
> > > EP from which the transaction was initiated against.
>
> > The generation of a *CQ* event makes sense. We need to define if and how
> counters, locally and remote, are updated. EQ events are not the right API
> match.
>
> Agreed on the CQ aspect. As a note, EQs are not being discussed for the
> initiator, only the target, so I'll put my EQ comments in the next comment. As a
> general comment, I think that this could be a good candidate for discussion at
> the next OFIWG because it is a strange grey area to me.
>
> > > * At the target, this should generate an event to the target's
> event queue –
> > > if and only if the provider supports software emulated events. If a provider
> is capable
> > > of hardware level commits to persistent memory, the transaction should be
> consumed
> > > transparently by the hardware, and does not need to generate an event at
> the target.
> > > This will require an additional event definition in libfabric (See definition for
> > > fi_eq_commit_entry)
>
> > This too needs to be defined based on the application level semantics, not
> implementation. The app should not be aware of implementation differences,
> except where mode bits dictate for performance reasons. (And I can say that
> developers hate dealing with those differences, so we need to eliminate them.)
>
> > If we limit commit to RMA transfers, it makes sense for it to act as an RMA
> call for most purposes (i.e. fi_readmsg/fi_writemsg). For example, the ability to
> carry CQ data and generate remote events (FI_RMA_EVENTS) on the target CQ
> and counters. We also need to consider if there's any impact on counters
> associated with the MR.
>
> I agree that this needs to be defined in terms of application-level behavior.
> However, I do think we need to talk about if and how applications should be
> expected to facilitate the desired functionality if the hardware is not capable of
> it. The 'how' aspect of a provider like sockets implements the functionality isn't
> important to define here, but if the provider needs the application to
> interact/configure in a specific way then I think that should be covered here. If
> there isn’t hardware support for FI_COMMIT_COMPLETE, then it seems to
> become a much more difficult problem. Libfabric could provide events to the
> application through EQ or CQ events, or go a similar route as HMEM is going
> now. I'd prefer to provide events to the application rather than attempt to
> support every PMEM library/hardware when handling the software emulation
> case.
>
> > > * A new EQ event definition (fi_eq_commit_entry) to support software-
> emulated
> > > persistence for devices that cannot provide hardware support
> > >
> > > * The iov, and count variables mirror the original iov, and count
> contents of
> > > the originating request.
> > > * The flags may be a diminished set of flags from the original
> transaction
> > > under the assumption that only some flags would have meaning at the
> target and sending
> > > originator-only flags to the target would have little value to the target
> process.
>
> > If any events are generated, they need to be CQ related, not EQ.
>
> This is where I believe it becomes a grey area. I could see using FI_RMA_EVENT
> or something similar to provoke a CQ event generated at the target, but it
> doesn't feel like fi_commit is a data transfer operation. It seems like a control
> operation, which is another reason why it was defined as generating an EQ
> event. The commit/"flush" is a control operation so it feels aligned with EQ.
>
>
> > > * Additional flags or capabilities
> > >
> > > * A provider should be able to indicate whether they support
> software
> > > emulated notifications of fi_commit, or whether they can handle hardware
> requests for
> > > commits to persistent memory
>
> > The implementation of hardware vs software should not be exposed. Hybrid
> solutions (e.g. RxM or large transfers over verbs devices) are also possible.
>
> If libfabric provides an event to the upper layer, I believe libfabric can support
> many more persistent memory models and devices by propagating events to the
> upper layer than if we attempt to put that capability into libfabric and support it
> transparently for the user. It's just my view, but application writers have asked
> us to optimize data transfers over the network with the abstraction we provide.
> I think. This could be another complicated topic and we could discuss it at the
> next OFIWG.
>
>
> > The FI_RMA_PMEM capability should be sufficient to indicate support for
> RMA reads and writes to persistent memory. That should be an inclusive flag
> (along with the API version) indicating that all related operations are supported.
>
> Something like this?
>
> #define FI_PMEM (FI_RMA_PMEM | FI_AMO_PMEM | FI_MSG_PMEM)
>
>
> > Support for messaging requires additional definitions. Part of the discussion
> is figuring out the scope of what should be defined in the short term. As
> mentioned above, FI_FENCE | FI_COMMIT_COMPLETE can be used to commit
> message transfers. I can't think of a better alternative here. However, I'm not
> sure if the proposed IBTA and IETF specifications will result in hardware capable
> of supporting the FI_FENCE | FI_COMMIT_COMPLETE semantic. :/
>
>
> Agreed on messaging, but it lacks a good use case yet so I haven't been as
> concerned.
>
> I'm not yet convinced on FI_COMMIT_COMPLETE|FI_FENCE. If libfabric
> suggested the use of that, does that imply that providers must support 0-length
> sends and/or control messaging on behalf of the application ? Does the data
> transfer itself provide any context to the region being flushed? What happens in
> the case of multiple persistent memory domains or devices? How would that
> data transfer provide the context necessary to flush a specific region, memory
> domain, or device? This seems more complicated than the initial suggestion
> indicates.
>
> > > * Addition of an event handler registration for handling event queue
> entries within
> > > the provider context (See Definition: fi_eq_event_handler)
> > >
> > > * Essentially, this becomes a registered callback for the target
> application
> > > to handle specific event types. We can use this mechanism with the target
> application
> > > to allow the provider to handle events internally using a function provided
> by the
> > > application. The function would contain the logic necessary to handle the
> event
>
> > Callbacks are to be avoided. They present difficult locking scenarios with
> severe restrictions on what the application can do from the callback, and
> present challenging object destruction situations. Those restrictions can be
> difficult for an application to enforce, since calls outside the app to other
> libraries may violate them.
>
> It's a good argument, and generally I feel the same way. What do you suggest as
> an alternative? Callbacks were suggest as a way for the provider to do some
> behavior on behalf of the application upon the receipt of the associated event.
> This would have allowed the provider to issue the commit/flush to device and
> then return the ACK back to the initiator that the commit had succeeded/data
> was flushed as requested. Without a callback, I do not see a clean way for
> libfabric to coordinate flush and acknowledgement back to the initiator.
>
> > To be clear, the proposal only supports RMA writes, and maybe atomics, to
> the target memory. That is likely sufficient for now, but I'd like to ensure that
> we have a way to extend pmem support beyond the limited use cases being
> discussed.
>
> RMA, and atomics -- with the intent not to exclude messaging. This is why the
> naming change from FI_RMA_PMEM to FI_PMEM was suggested.
>
>
> > > * Previous functionality allows for a commit for every message as
> is the case
> > > for FI_COMMIT_COMPLETE, or the use of FI_COMMIT on a per-
> transaction basis. The need in
> > > ...
> > > delivery model, and provides a mechanism to ensure that those data
> transfers are
> > > eventually persisted.
>
> > Unless the app has set FI_COMMIT_COMPLETE as the default completion
> model, it only applies to the operation on which it was set. The main gap I'm
> aware of with proposed specifications is support of a 'flush' type semantic.
>
> The flush mechanic is the primary gap that the proposal is attempting to identify.
> However, I believe the software emulation elements of the proposal are
> valuable for prototyping efforts.
>
> --
> James Swaro
> P: +1 (651) 605-9000
>
> On 4/27/20, 9:38 PM, "Hefty, Sean" <sean.hefty at intel.com> wrote:
>
> Top-posting main discussion point. Other comments further down:
>
> Conceptually, what's being proposed is specifying a data transfer as a 2-step
> process.
>
> 1. identify the data source and target
> 2. specify the completion semantic
>
> Theoretically, the actual data transfer can occur any time after step 1 and
> before step 2 completes. As an additional optimization, step 2 can apply to
> multiple step 1s.
>
> We need to decide:
>
> A. What completion semantic applies to step 1?
> B. What operations do we support for step 1?
> C. What completion semantics are supported for step 2?
>
> The current answers are:
>
> A. All completion levels are supported. It's possible that none of them are
> desirable here, and we need to introduce a new mode:
> FI_UNDEFINED_COMPLETE. This would indicate that the buffer cannot be re-
> used, and the data is not visible at the target, until step 2 completes that covers
> the same target memory range.
>
> B. RMA reads and writes are supported. It shouldn't be difficult to support
> atomics through the same APIs as well. Message transfers are more difficult to
> specify in step 2, making them harder to support.
>
> C. The proposal only supports FI_COMMIT_COMPLETE. Other levels could be
> added, though that may only make sense if we define something like
> FI_UNDEFINED_COMPLETE.
>
> I'm throwing FI_UNDEFINED_COMPLETE out for discussion. There would be
> issues trying to define it, since data transfers issued at step 1 could generate
> completions locally and remotely prior to step 2 being invoked. Those
> completions just wouldn't mean anything until step 2 completes. The provider
> would select the best completion option for step 1.
>
>
> > Libfabric requires modifications to support RMA and atomic operations
> targeted at
> > remote memory registrations backed by persistent memory devices. These
> modifications
> > should be made with the intent to drive support for persistent memory
> usage by
> > applications that rely on communications middleware such as SHMEM in a
> manner that is
> > consistent with byte-based/stream-based addressable memory formats.
> Existing proposals
> > (initial proposal) support NVMe/PMoF approaches, which this approach
> should support
> > flat memory, non-block addressed memory structures and devices.
> >
> > Changes may be required in as many as three areas:
> >
> > * Memory registration calls
> >
> > * This allows a memory region to be registered as being capable
> of
> > persistence. This has already been introduced into the upstream libfabric
> GITHUB, but
> > should be reviewed to ensure it matches use case requirements.
>
> FI_RMA_PMEM is defined as a MR flag. Note that this definition intentionally
> limits non-RMA transfers from taking advantage of persistent memory
> semantics.
>
> The intent of this flag is to give providers implementation flexibility,
> specifically based on hardware/software differences.
>
>
> > * Completion semantics
> >
> > * These changes allow a completion event or notification to be
> deferred until
> > the referenced data has reached the persistence domain at the target. This
> has already
> > been introduced into the upstream libfabric GITHUB, but should be reviewed
> to ensure it
> > matches use case requirements.
>
> Completion semantics may be adjusted on a per transfer basis. The
> FI_COMMMIT_COMPLETE semantic applies to both the initiator and target.
> Completion semantics are a minimal guarantee from a provider. The provider
> can do more.
>
> > * Consumer control of persistence
> >
> > * As presently implemented in the upstream libfabric GITHUB,
> persistence is
> > determined on a transaction-by-transaction basis. It was acknowledged at
> the time that
> > this is a simplistic implementation. We need to reach consensus on the
> following:
> >
> > * Should persistence be signaled on the basis of the
> target memory
> > region? For example, one can imagine a scheme where data targeted at a
> particular
> > memory region is automatically pushed into the persistence domain by the
> target,
> > obviating the need for any sort of commit operation.
>
> In cases where a commit operation is not needed, it can become a no-op, but
> it may be required functionality for some providers.
>
>
> > * Is an explicit 'commit' operation of some type required,
> and if so,
> > what is the scope of that commit operation? Is there a persistence fence
> defined such
> > that every operation prior to the fence is made persistent by a commit
> operation?
>
> With the current API, persistence can be achieved by issuing a 0-length RMA
> with FI_COMMIT_COMPLETE | FI_FENCE semantics. The fence requires that
> *all* prior transfers over that endpoint meet the requested completion
> semantic.
>
> This may not be ideal, but may be the best way to handle message transfers to
> persistent memory.
>
>
> > Proposal
> >
> > The experimental work in the OFIWG/libfabric branch is sufficient for the
> needs of
> > SHMEM, with exception to the granularity of event generation. When the
> current
> > implementation generates events, it would generate commit-level
> completion events with
> > every operation. That type of operation would make the delivery of
> completion events
> > take longer than necessary for most operations, so SHMEM would need
> finer control over
> > commit flushing behavior.
>
> OFI does not require that an event be generated for every transfer. It also
> allows transfers to report completions using 'lower' completion semantics, such
> as FI_TRANSMIT_COMPLETE. Completion events at the target of an RMA write
> requires the FI_RMA_EVENT capability, and is independent from PMEM.
>
> > To satisfy this, the following is being proposed:
> >
> > * A new API: fi_commit (See definitions: fi_commit)
> > The new API would be used to generate a commit instruction to a target
> peer. The
> > instruction would be defined by a set of memory registration keys, or
> regions by which
> > the target could issue a commit to persistent memory.
>
> See discussion at the top.
>
>
> > * A single request to fi_commit should generate a control
> message to target
> > hardware or software emulation environment to flush the contents of
> memory targets.
>
> This needs to be defined in terms of application level semantics, not
> implementation details. fi_commit could be a no-op based on the provider
> implementation. (It actually would be for the socket and tcp providers, which
> act at the target based on the MR flag.)
>
> > Memory targets are defined by the iov structures, and key fields – and the
> number of
> > memory targets are defined by the count field. The destination address is
> handled by
> > the dest_addr field. The flags field is held reserved at this time to allow for
> > flexibility in the API design to future proof against options we might not
> conceive of
> > until after the prototype is complete, and the context available for the user
> and
> > returned with the completion
>
> The proposed definition is limited to RMA (and atomic) writes. There is no
> mechanism for handling RMA reads into persistent memory, for example. That
> should be included. Message transfers may need a separate mechanism for this.
> That can be deferred (left undefined by the man pages), but should ideally we
> should have an idea for how to support it.
>
> The best existing API definition for an fi_commit call would be the
> fi_readmsg/fi_writemsg() calls. We could even re-use those calls by adding a
> flag.
>
> > * Since this API behaves like a data transfer API, it is expected that
> this
> > API would generate a completion event to the local completion queue
> associated with the
> > EP from which the transaction was initiated against.
>
> The generation of a *CQ* event makes sense. We need to define if and how
> counters, locally and remote, are updated. EQ events are not the right API
> match.
>
>
> > * At the target, this should generate an event to the target's
> event queue –
> > if and only if the provider supports software emulated events. If a provider is
> capable
> > of hardware level commits to persistent memory, the transaction should be
> consumed
> > transparently by the hardware, and does not need to generate an event at
> the target.
> > This will require an additional event definition in libfabric (See definition for
> > fi_eq_commit_entry)
>
> This too needs to be defined based on the application level semantics, not
> implementation. The app should not be aware of implementation differences,
> except where mode bits dictate for performance reasons. (And I can say that
> developers hate dealing with those differences, so we need to eliminate them.)
>
> If we limit commit to RMA transfers, it makes sense for it to act as an RMA
> call for most purposes (i.e. fi_readmsg/fi_writemsg). For example, the ability to
> carry CQ data and generate remote events (FI_RMA_EVENTS) on the target CQ
> and counters. We also need to consider if there's any impact on counters
> associated with the MR.
>
>
> > * A new EQ event definition (fi_eq_commit_entry) to support software-
> emulated
> > persistence for devices that cannot provide hardware support
> >
> > * The iov, and count variables mirror the original iov, and count
> contents of
> > the originating request.
> > * The flags may be a diminished set of flags from the original
> transaction
> > under the assumption that only some flags would have meaning at the
> target and sending
> > originator-only flags to the target would have little value to the target
> process.
>
> If any events are generated, they need to be CQ related, not EQ.
>
>
> > * Additional flags or capabilities
> >
> > * A provider should be able to indicate whether they support
> software
> > emulated notifications of fi_commit, or whether they can handle hardware
> requests for
> > commits to persistent memory
>
> The implementation of hardware vs software should not be exposed. Hybrid
> solutions (e.g. RxM or large transfers over verbs devices) are also possible.
>
>
> > * An additional flag should be introduced to the fi_info
> structure
> > under modes: FI_COMMIT_MANUAL (or something else)
>
> The FI_RMA_PMEM capability should be sufficient to indicate support for RMA
> reads and writes to persistent memory. That should be an inclusive flag (along
> with the API version) indicating that all related operations are supported.
>
>
> > * This flag would indicate to the application that
> events may be
> > generated to the event queue for consumption by the application. Commit
> events would be
> > generated upon receipt of a commit message from a remote peer, and the
> application
> > would be responsible for handling the event.
> > * Lack of the FI_COMMIT_MANUAL flag, and the
> presence of the
> > FI_RMA_PMEM (or FI_PMEM) flag in the info structure should imply that the
> hardware is
> > capable of handling the commit requests to persistent memory and the
> application does
> > not need to read the event queue for commit events.
> >
> > * Change of flag definition
> >
> > * The FI_RMA_PMEM flag should be changed to FI_PMEM to
> indicate that the
> > provider is PMEM aware, and supports RMA/AMO/MSG operations to and
> from persistent
> > memory.
> > * There may be little value in supporting messaging interfaces,
> but it is
> > something that could supported.
>
> Support for messaging requires additional definitions. Part of the discussion is
> figuring out the scope of what should be defined in the short term. As
> mentioned above, FI_FENCE | FI_COMMIT_COMPLETE can be used to commit
> message transfers. I can't think of a better alternative here. However, I'm not
> sure if the proposed IBTA and IETF specifications will result in hardware capable
> of supporting the FI_FENCE | FI_COMMIT_COMPLETE semantic. :/
>
>
> > * Addition of an event handler registration for handling event queue
> entries within
> > the provider context (See Definition: fi_eq_event_handler)
> >
> > * Essentially, this becomes a registered callback for the target
> application
> > to handle specific event types. We can use this mechanism with the target
> application
> > to allow the provider to handle events internally using a function provided by
> the
> > application. The function would contain the logic necessary to handle the
> event
>
> Callbacks are to be avoided. They present difficult locking scenarios with
> severe restrictions on what the application can do from the callback, and
> present challenging object destruction situations. Those restrictions can be
> difficult for an application to enforce, since calls outside the app to other
> libraries may violate them.
>
>
> > * Specific to PMEM, a function handler would be used by the
> target
> > application to handle commits to persistent memory as they were delivered
> without
> > requiring a fi_eq_read and some form of acknowledgement around the
> commit action. With
> > the handler, the commit could be handled entirely by the function provided
> by the
> > application, and the return code from the application provided call-back
> would be
> > sufficient for a software emulation in the provider to produce the return
> message to
> > the sender that the commit transaction is fully complete. The use of a
> handler allows
> > us to make the commit transaction as light-weight, or heavy-weight as
> necessary.
> >
> > Definitions:
> >
> > fi_commit
> >
> > ssize_t fi_commit(struct fid_ep *ep,
> >
> > const struct fi_rma_iov *iov,
> >
> > size_t count,
> >
> > fi_addr_t dest_addr,
> >
> > uint64_t flags,
> >
> > void *context);
> >
> > fi_eq_commit_entry
> >
> > struct fi_eq_commit_entry {
> >
> > fid_t fid; /* fid associated with request */
> >
> > const struct fi_rma_iov *iov; /* iovec of memory regions to be
> > committed to persistent memory */
> >
> > size_t count; /* number of iovec/key entries */
> >
> > uint64_t flags; /* operation-specific flags */
> >
> > };
> >
> > fi_eq_event_handler
> >
> > typedef ssize_t (*fi_eq_event_handler_t)(struct fid_eq *eq,
> >
> > uint64_t event_type,
> >
> > void *event_data,
> >
> > uint64_t len,
> >
> > void *context);
> >
> > ssize_t fi_eq_register_handler(struct fid_eq *eq,
> >
> > uint64_t event_type,
> >
> > fi_eq_event_handler_t handler,
> >
> > void *context);
> >
> > Use cases supported by this proposal:
> >
> > * As an application writer, I need to commit multiple previously-sent data
> > transfers to the persistence domain
>
> To be clear, the proposal only supports RMA writes, and maybe atomics, to
> the target memory. That is likely sufficient for now, but I'd like to ensure that
> we have a way to extend pmem support beyond the limited use cases being
> discussed.
>
>
> > * Previous functionality allows for a commit for every message as
> is the case
> > for FI_COMMIT_COMPLETE, or the use of FI_COMMIT on a per-transaction
> basis. The need in
> > this use case is performance-oriented, to allow less strict delivery model to
> the NIC
> > for most messages followed up with a 'flush' of the NIC to the persistence
> domain. This
> > allows most messages targeted to the persistence domain to complete with
> a less strict
> > delivery model, and provides a mechanism to ensure that those data
> transfers are
> > eventually persisted.
>
> Unless the app has set FI_COMMIT_COMPLETE as the default completion
> model, it only applies to the operation on which it was set. The main gap I'm
> aware of with proposed specifications is support of a 'flush' type semantic.
>
>
> - Sean
>
>
> _______________________________________________
> ofiwg mailing list
> ofiwg at lists.openfabrics.org
> INVALID URI REMOVED
> 3A__lists.openfabrics.org_mailman_listinfo_ofiwg&d=DwIGaQ&c=C5b8zRQO1
> miGmBeVZ2LFWg&r=Gu85MpS7ImGmwh9TaJU-
> rXwAoPzObckoDNIQpAj4MDo&m=Qj08qjdDK2mAzsmlUcJSi6FH3QDFiIz5O7BNvH
> aCvTs&s=PbOj9sBPeYA9Giq_DII7GYYyoLXmwpPiOWLwIylEGrQ&e=
>
> _______________________________________________
> ofiwg mailing list
> ofiwg at lists.openfabrics.org
> INVALID URI REMOVED
> 3A__lists.openfabrics.org_mailman_listinfo_ofiwg&d=DwIGaQ&c=C5b8zRQO1
> miGmBeVZ2LFWg&r=Gu85MpS7ImGmwh9TaJU-
> rXwAoPzObckoDNIQpAj4MDo&m=Qj08qjdDK2mAzsmlUcJSi6FH3QDFiIz5O7BNvH
> aCvTs&s=PbOj9sBPeYA9Giq_DII7GYYyoLXmwpPiOWLwIylEGrQ&e=
> _______________________________________________
> ofiwg mailing list
> ofiwg at lists.openfabrics.org
> INVALID URI REMOVED
> 3A__lists.openfabrics.org_mailman_listinfo_ofiwg&d=DwIGaQ&c=C5b8zRQO1
> miGmBeVZ2LFWg&r=Gu85MpS7ImGmwh9TaJU-
> rXwAoPzObckoDNIQpAj4MDo&m=Qj08qjdDK2mAzsmlUcJSi6FH3QDFiIz5O7BNvH
> aCvTs&s=PbOj9sBPeYA9Giq_DII7GYYyoLXmwpPiOWLwIylEGrQ&e=
More information about the ofiwg
mailing list