[ofiwg] Proposal for enhancement to support additional Persistent Memory use cases (ofiwg/libfabric#5874)
Douglas, Chet R
chet.r.douglas at intel.com
Mon May 4 07:44:23 PDT 2020
I understand. But I don’t think we should move forward with any pmem additions until we at least talk about it. Have the IBTA and IETF drafts been taken into account in whats being proposed?
-----Original Message-----
From: Grun, Paul <paul.grun at hpe.com>
Sent: Friday, May 01, 2020 12:14 PM
To: Douglas, Chet R <chet.r.douglas at intel.com>; Rupert Dance - SFI <rsdance at soft-forge.com>; Swaro, James E <james.swaro at hpe.com>; Hefty, Sean <sean.hefty at intel.com>; ofiwg at lists.openfabrics.org
Subject: RE: [ofiwg] Proposal for enhancement to support additional Persistent Memory use cases (ofiwg/libfabric#5874)
Keep in mind that the libfabric API doesn't necessarily directly mimic what is implemented in verbs. The requirement is that the verbs semantics are implementable via a libfabric implementation. I think of libfabric as being a slightly more abstract interface than verbs, hence the libfabric APIs don't necessarily expose the gritty details described in the current IBTA Annex.
> -----Original Message-----
> From: ofiwg <ofiwg-bounces at lists.openfabrics.org> On Behalf Of
> Douglas, Chet R
> Sent: Friday, May 1, 2020 7:51 AM
> To: Rupert Dance - SFI <rsdance at soft-forge.com>; Swaro, James E
> <james.swaro at hpe.com>; Hefty, Sean <sean.hefty at intel.com>;
> ofiwg at lists.openfabrics.org
> Subject: Re: [ofiwg] Proposal for enhancement to support additional
> Persistent Memory use cases (ofiwg/libfabric#5874)
>
> It matters! Eventually (now?) we want full RDMA extension support in
> libfabrics, libibverbs, and the verbs spec. This appears to be based
> on Intel's original libfabric proposal? Commit is not a valid term.
> Complete RDMA memory placement extension support looks different than it did in that original proposal.
> We need to architect the complete solution. Don’t we? Does it
> support RDMA Flush, Write Atomic and Verify? How do you register
> cached vs uncached pmem? Is this already in the wild? If not we
> shouldn’t release it without further consideration.
>
> -----Original Message-----
> From: ofiwg <ofiwg-bounces at lists.openfabrics.org> On Behalf Of Rupert
> Dance
> - SFI
> Sent: Friday, May 01, 2020 8:31 AM
> To: 'Swaro, James E' <james.swaro at hpe.com>; Hefty, Sean
> <sean.hefty at intel.com>; ofiwg at lists.openfabrics.org
> Subject: Re: [ofiwg] Proposal for enhancement to support additional
> Persistent Memory use cases (ofiwg/libfabric#5874)
>
> Is this team aware of what the IBTA is doing with PME or does it not
> matter since it is libfabrics?
>
> -----Original Message-----
> From: ofiwg <ofiwg-bounces at lists.openfabrics.org> On Behalf Of Swaro,
> James E
> Sent: Friday, May 01, 2020 9:41 AM
> To: Hefty, Sean <sean.hefty at intel.com>; ofiwg at lists.openfabrics.org
> Subject: Re: [ofiwg] Proposal for enhancement to support additional
> Persistent Memory use cases (ofiwg/libfabric#5874)
>
> > > *This allows a memory region to be registered as being capable
> of
> > > persistence. This has already been introduced into the upstream
> > libfabric
> GITHUB, but
> > > should be reviewed to ensure it matches use case requirements.
>
> > FI_RMA_PMEM is defined as a MR flag. Note that this definition
> intentionally limits non-RMA transfers from taking advantage of
> persistent memory semantics.
>
> > The intent of this flag is to give providers implementation
> > flexibility,
> specifically based on hardware/software differences.
>
> Understood. The intent of this section of the proposal was to outline
> potential areas for change. Any questions posed here were historical
> and meant to provoke discussion. They might even be a little dated.
> Those changes and the rationale are discussed below.
>
>
> > > every operation. That type of operation would make the delivery
> > of
> completion events
> > > take longer than necessary for most operations, so SHMEM would
> > need
> finer control over
> > > commit flushing behavior.
>
> > OFI does not require that an event be generated for every
> > transfer. It also
> allows transfers to report completions using 'lower' completion
> semantics, such as FI_TRANSMIT_COMPLETE. Completion events at the
> target of an RMA write requires the FI_RMA_EVENT capability, and is independent from PMEM.
>
> Understood. This paragraph was intended to address a complication that
> was raised in one of the meetings.
>
> It was discussed that with some applications, all or most data would
> be required to be persistent. The solution at the time was to provide
> FI_COMMIT_COMPLETE as part of the default TX op_flags at the time,
> which would incur a higher cost to provide that level of completion.
> The goal with this proposal would be to allow upper layers to set a
> less strict completion model, such as delivery or transmit complete as
> part of the default op_flag, or per- operation flag and address
> persistence as a batch operation via the fi_commit API.
>
>
> > > *A single request to fi_commit should generate a control
> message to target
> > > hardware or software emulation environment to flush the
> > contents of
> memory targets.
>
> > This needs to be defined in terms of application level semantics,
> > not
> implementation details. fi_commit could be a no-op based on the
> provider implementation. (It actually would be for the socket and tcp
> providers, which act at the target based on the MR flag.)
>
> Completely agree. Rereading this proposal, I meant to change some of
> these discussion points away from implementation to a discussion on
> behavior and semantics. How fi_commit behaves w.r.t implementation
> specifics isn't within the scope of this proposal. Implementation
> details are something I'd prefer to stay away from so we can define how we expect it to behave.
>
> > > flexibility in the API design to future proof against options
> > we might not
> conceive of
> > > until after the prototype is complete, and the context
> > available for the user
> and
> > > returned with the completion
>
> > The proposed definition is limited to RMA (and atomic) writes.
> > There is no
> mechanism for handling RMA reads into persistent memory, for example.
> That should be included. Message transfers may need a separate mechanism for this.
> That can be deferred (left undefined by the man pages), but should
> ideally we should have an idea for how to support it.
>
> > The best existing API definition for an fi_commit call would be
> > the
> fi_readmsg/fi_writemsg() calls. We could even re-use those calls by
> adding a flag.
>
> The proposed definition is limited to RMA and AMO because we didn't
> have a strong use case for messaging, but I'd like to go the route
> that allows messaging to be easily included if that changes later down the road.
>
>
> > > *Since this API behaves like a data transfer API, it is
> > expected that
> this
> > > API would generate a completion event to the local completion
> > queue
> associated with the
> > > EP from which the transaction was initiated against.
>
> > The generation of a *CQ* event makes sense. We need to define if
> > and how
> counters, locally and remote, are updated. EQ events are not the
> right API match.
>
> Agreed on the CQ aspect. As a note, EQs are not being discussed for
> the initiator, only the target, so I'll put my EQ comments in the next
> comment. As a general comment, I think that this could be a good
> candidate for discussion at the next OFIWG because it is a strange grey area to me.
>
> > > *At the target, this should generate an event to the target's
> event queue –
> > > if and only if the provider supports software emulated events.
> > If a provider
> is capable
> > > of hardware level commits to persistent memory, the transaction
> > should be
> consumed
> > > transparently by the hardware, and does not need to generate an
> > event at
> the target.
> > > This will require an additional event definition in libfabric (See definition for
> > > fi_eq_commit_entry)
>
> > This too needs to be defined based on the application level
> > semantics, not
> implementation. The app should not be aware of implementation
> differences, except where mode bits dictate for performance reasons.
> (And I can say that developers hate dealing with those differences, so
> we need to eliminate them.)
>
> > If we limit commit to RMA transfers, it makes sense for it to act
> > as an RMA
> call for most purposes (i.e. fi_readmsg/fi_writemsg). For example,
> the ability to carry CQ data and generate remote events
> (FI_RMA_EVENTS) on the target CQ and counters. We also need to
> consider if there's any impact on counters associated with the MR.
>
> I agree that this needs to be defined in terms of application-level behavior.
> However, I do think we need to talk about if and how applications
> should be expected to facilitate the desired functionality if the
> hardware is not capable of it. The 'how' aspect of a provider like
> sockets implements the functionality isn't important to define here,
> but if the provider needs the application to interact/configure in a
> specific way then I think that should be covered here. If there isn’t
> hardware support for FI_COMMIT_COMPLETE, then it seems to become a
> much more difficult problem. Libfabric could provide events to the
> application through EQ or CQ events, or go a similar route as HMEM is
> going now. I'd prefer to provide events to the application rather than
> attempt to support every PMEM library/hardware when handling the software emulation case.
>
> > > *A new EQ event definition (fi_eq_commit_entry) to support
> > software-
> emulated
> > > persistence for devices that cannot provide hardware support
> > >
> > > *The iov, and count variables mirror the original iov, and
> > count
> contents of
> > > the originating request.
> > > *The flags may be a diminished set of flags from the original
> transaction
> > > under the assumption that only some flags would have meaning at
> > the
> target and sending
> > > originator-only flags to the target would have little value to
> > the target
> process.
>
> > If any events are generated, they need to be CQ related, not EQ.
>
> This is where I believe it becomes a grey area. I could see using
> FI_RMA_EVENT or something similar to provoke a CQ event generated at
> the target, but it doesn't feel like fi_commit is a data transfer
> operation. It seems like a control operation, which is another reason
> why it was defined as generating an EQ event. The commit/"flush" is a control operation so it feels aligned with EQ.
>
>
> > > *Additional flags or capabilities
> > >
> > > *A provider should be able to indicate whether they support
> software
> > > emulated notifications of fi_commit, or whether they can handle
> > hardware
> requests for
> > > commits to persistent memory
>
> > The implementation of hardware vs software should not be exposed.
> > Hybrid
> solutions (e.g. RxM or large transfers over verbs devices) are also possible.
>
> If libfabric provides an event to the upper layer, I believe libfabric
> can support many more persistent memory models and devices by
> propagating events to the upper layer than if we attempt to put that
> capability into libfabric and support it transparently for the user.
> It's just my view, but application writers have asked us to optimize data transfers over the network with the abstraction we provide.
> I think. This could be another complicated topic and we could discuss
> it at the next OFIWG.
>
>
> > The FI_RMA_PMEM capability should be sufficient to indicate
> > support for
> RMA reads and writes to persistent memory. That should be an
> inclusive flag (along with the API version) indicating that all related operations are supported.
>
> Something like this?
>
> #define FI_PMEM (FI_RMA_PMEM | FI_AMO_PMEM | FI_MSG_PMEM)
>
>
> > Support for messaging requires additional definitions. Part of
> > the discussion
> is figuring out the scope of what should be defined in the short term.
> As mentioned above, FI_FENCE | FI_COMMIT_COMPLETE can be used to
> commit message transfers. I can't think of a better alternative here.
> However, I'm not sure if the proposed IBTA and IETF specifications
> will result in hardware capable of supporting the FI_FENCE |
> FI_COMMIT_COMPLETE semantic. :/
>
>
> Agreed on messaging, but it lacks a good use case yet so I haven't
> been as concerned.
>
> I'm not yet convinced on FI_COMMIT_COMPLETE|FI_FENCE. If libfabric
> suggested the use of that, does that imply that providers must support
> 0-length sends and/or control messaging on behalf of the application ?
> Does the data transfer itself provide any context to the region being
> flushed? What happens in the case of multiple persistent memory
> domains or devices? How would that data transfer provide the context
> necessary to flush a specific region, memory domain, or device? This
> seems more complicated than the initial suggestion indicates.
>
> > > *Addition of an event handler registration for handling event
> > queue
> entries within
> > > the provider context (See Definition: fi_eq_event_handler)
> > >
> > > *Essentially, this becomes a registered callback for the target
> application
> > > to handle specific event types. We can use this mechanism with
> > the target
> application
> > > to allow the provider to handle events internally using a
> > function provided
> by the
> > > application. The function would contain the logic necessary to
> > handle the
> event
>
> > Callbacks are to be avoided. They present difficult locking
> > scenarios with
> severe restrictions on what the application can do from the callback,
> and present challenging object destruction situations. Those
> restrictions can be difficult for an application to enforce, since
> calls outside the app to other libraries may violate them.
>
> It's a good argument, and generally I feel the same way. What do you
> suggest as an alternative? Callbacks were suggest as a way for the
> provider to do some behavior on behalf of the application upon the receipt of the associated event.
> This would have allowed the provider to issue the commit/flush to
> device and then return the ACK back to the initiator that the commit
> had succeeded/data was flushed as requested. Without a callback, I do
> not see a clean way for libfabric to coordinate flush and acknowledgement back to the initiator.
>
> > To be clear, the proposal only supports RMA writes, and maybe
> > atomics, to
> the target memory. That is likely sufficient for now, but I'd like to
> ensure that we have a way to extend pmem support beyond the limited
> use cases being discussed.
>
> RMA, and atomics -- with the intent not to exclude messaging. This is
> why the naming change from FI_RMA_PMEM to FI_PMEM was suggested.
>
>
> > > *Previous functionality allows for a commit for every message
> > as
> is the case
> > > for FI_COMMIT_COMPLETE, or the use of FI_COMMIT on a per-
> transaction basis. The need in
> > > ...
> > > delivery model, and provides a mechanism to ensure that those
> > data
> transfers are
> > > eventually persisted.
>
> > Unless the app has set FI_COMMIT_COMPLETE as the default
> > completion
> model, it only applies to the operation on which it was set. The main
> gap I'm aware of with proposed specifications is support of a 'flush' type semantic.
>
> The flush mechanic is the primary gap that the proposal is attempting to identify.
> However, I believe the software emulation elements of the proposal are
> valuable for prototyping efforts.
>
> --
> James Swaro
> P: +1 (651) 605-9000
>
> On 4/27/20, 9:38 PM, "Hefty, Sean" <sean.hefty at intel.com> wrote:
>
> Top-posting main discussion point. Other comments further down:
>
> Conceptually, what's being proposed is specifying a data transfer
> as a 2-step process.
>
> 1. identify the data source and target
> 2. specify the completion semantic
>
> Theoretically, the actual data transfer can occur any time after
> step 1 and before step 2 completes. As an additional optimization,
> step 2 can apply to multiple step 1s.
>
> We need to decide:
>
> A. What completion semantic applies to step 1?
> B. What operations do we support for step 1?
> C. What completion semantics are supported for step 2?
>
> The current answers are:
>
> A. All completion levels are supported. It's possible that none
> of them are desirable here, and we need to introduce a new mode:
> FI_UNDEFINED_COMPLETE. This would indicate that the buffer cannot be
> re- used, and the data is not visible at the target, until step 2
> completes that covers the same target memory range.
>
> B. RMA reads and writes are supported. It shouldn't be difficult
> to support atomics through the same APIs as well. Message transfers
> are more difficult to specify in step 2, making them harder to support.
>
> C. The proposal only supports FI_COMMIT_COMPLETE. Other levels
> could be added, though that may only make sense if we define something
> like FI_UNDEFINED_COMPLETE.
>
> I'm throwing FI_UNDEFINED_COMPLETE out for discussion. There
> would be issues trying to define it, since data transfers issued at
> step 1 could generate completions locally and remotely prior to step 2
> being invoked. Those completions just wouldn't mean anything until
> step 2 completes. The provider would select the best completion option for step 1.
>
>
> > Libfabric requires modifications to support RMA and atomic
> operations targeted at
> > remote memory registrations backed by persistent memory devices.
> These modifications
> > should be made with the intent to drive support for persistent
> memory usage by
> > applications that rely on communications middleware such as
> SHMEM in a manner that is
> > consistent with byte-based/stream-based addressable memory formats.
> Existing proposals
> > (initial proposal) support NVMe/PMoF approaches, which this
> approach should support
> > flat memory, non-block addressed memory structures and devices.
> >
> > Changes may be required in as many as three areas:
> >
> > *Memory registration calls
> >
> > *This allows a memory region to be registered as being capable
> of
> > persistence. This has already been introduced into the upstream
> libfabric GITHUB, but
> > should be reviewed to ensure it matches use case requirements.
>
> FI_RMA_PMEM is defined as a MR flag. Note that this definition
> intentionally limits non-RMA transfers from taking advantage of
> persistent memory semantics.
>
> The intent of this flag is to give providers implementation
> flexibility, specifically based on hardware/software differences.
>
>
> > *Completion semantics
> >
> > *These changes allow a completion event or notification to be
> deferred until
> > the referenced data has reached the persistence domain at the
> target. This has already
> > been introduced into the upstream libfabric GITHUB, but should
> be reviewed to ensure it
> > matches use case requirements.
>
> Completion semantics may be adjusted on a per transfer basis. The
> FI_COMMMIT_COMPLETE semantic applies to both the initiator and target.
> Completion semantics are a minimal guarantee from a provider. The
> provider can do more.
>
> > *Consumer control of persistence
> >
> > *As presently implemented in the upstream libfabric GITHUB,
> persistence is
> > determined on a transaction-by-transaction basis. It was
> acknowledged at the time that
> > this is a simplistic implementation. We need to reach consensus
> on the
> following:
> >
> > *Should persistence be signaled on the basis of the target
> memory
> > region? For example, one can imagine a scheme where data
> targeted at a particular
> > memory region is automatically pushed into the persistence
> domain by the target,
> > obviating the need for any sort of commit operation.
>
> In cases where a commit operation is not needed, it can become a
> no-op, but it may be required functionality for some providers.
>
>
> > *Is an explicit 'commit' operation of some type required, and if
> so,
> > what is the scope of that commit operation? Is there a
> persistence fence defined such
> > that every operation prior to the fence is made persistent by a
> commit operation?
>
> With the current API, persistence can be achieved by issuing a
> 0-length RMA with FI_COMMIT_COMPLETE | FI_FENCE semantics. The fence
> requires that
> *all* prior transfers over that endpoint meet the requested completion
> semantic.
>
> This may not be ideal, but may be the best way to handle message
> transfers to persistent memory.
>
>
> > Proposal
> >
> > The experimental work in the OFIWG/libfabric branch is
> sufficient for the needs of
> > SHMEM, with exception to the granularity of event generation.
> When the current
> > implementation generates events, it would generate commit-level
> completion events with
> > every operation. That type of operation would make the delivery
> of completion events
> > take longer than necessary for most operations, so SHMEM would
> need finer control over
> > commit flushing behavior.
>
> OFI does not require that an event be generated for every
> transfer. It also allows transfers to report completions using
> 'lower' completion semantics, such as FI_TRANSMIT_COMPLETE.
> Completion events at the target of an RMA write requires the FI_RMA_EVENT capability, and is independent from PMEM.
>
> > To satisfy this, the following is being proposed:
> >
> > *A new API: fi_commit (See definitions: fi_commit)
> > The new API would be used to generate a commit instruction to a
> target peer. The
> > instruction would be defined by a set of memory registration
> keys, or regions by which
> > the target could issue a commit to persistent memory.
>
> See discussion at the top.
>
>
> > *A single request to fi_commit should generate a control message
> to target
> > hardware or software emulation environment to flush the contents
> of memory targets.
>
> This needs to be defined in terms of application level semantics,
> not implementation details. fi_commit could be a no-op based on the
> provider implementation. (It actually would be for the socket and tcp
> providers, which act at the target based on the MR flag.)
>
> > Memory targets are defined by the iov structures, and key fields
> – and the number of
> > memory targets are defined by the count field. The destination
> address is handled by
> > the dest_addr field. The flags field is held reserved at this time to allow for
> > flexibility in the API design to future proof against options we
> might not conceive of
> > until after the prototype is complete, and the context available
> for the user and
> > returned with the completion
>
> The proposed definition is limited to RMA (and atomic) writes.
> There is no mechanism for handling RMA reads into persistent memory,
> for example. That should be included. Message transfers may need a separate mechanism for this.
> That can be deferred (left undefined by the man pages), but should
> ideally we should have an idea for how to support it.
>
> The best existing API definition for an fi_commit call would be
> the
> fi_readmsg/fi_writemsg() calls. We could even re-use those calls by
> adding a flag.
>
> > *Since this API behaves like a data transfer API, it is expected
> that this
> > API would generate a completion event to the local completion
> queue associated with the
> > EP from which the transaction was initiated against.
>
> The generation of a *CQ* event makes sense. We need to define if
> and how counters, locally and remote, are updated. EQ events are not
> the right API match.
>
>
> > *At the target, this should generate an event to the target's
> event queue –
> > if and only if the provider supports software emulated events.
> If a provider is capable
> > of hardware level commits to persistent memory, the transaction
> should be consumed
> > transparently by the hardware, and does not need to generate an
> event at the target.
> > This will require an additional event definition in libfabric (See definition for
> > fi_eq_commit_entry)
>
> This too needs to be defined based on the application level
> semantics, not implementation. The app should not be aware of
> implementation differences, except where mode bits dictate for
> performance reasons. (And I can say that developers hate dealing with
> those differences, so we need to eliminate them.)
>
> If we limit commit to RMA transfers, it makes sense for it to act
> as an RMA call for most purposes (i.e. fi_readmsg/fi_writemsg). For
> example, the ability to carry CQ data and generate remote events
> (FI_RMA_EVENTS) on the target CQ and counters. We also need to
> consider if there's any impact on counters associated with the MR.
>
>
> > *A new EQ event definition (fi_eq_commit_entry) to support
> software- emulated
> > persistence for devices that cannot provide hardware support
> >
> > *The iov, and count variables mirror the original iov, and count
> contents of
> > the originating request.
> > *The flags may be a diminished set of flags from the original
> transaction
> > under the assumption that only some flags would have meaning at
> the target and sending
> > originator-only flags to the target would have little value to
> the target process.
>
> If any events are generated, they need to be CQ related, not EQ.
>
>
> > *Additional flags or capabilities
> >
> > *A provider should be able to indicate whether they support
> software
> > emulated notifications of fi_commit, or whether they can handle
> hardware requests for
> > commits to persistent memory
>
> The implementation of hardware vs software should not be exposed.
> Hybrid solutions (e.g. RxM or large transfers over verbs devices) are also possible.
>
>
> > *An additional flag should be introduced to the fi_info
> structure
> > under modes: FI_COMMIT_MANUAL (or something else)
>
> The FI_RMA_PMEM capability should be sufficient to indicate
> support for RMA reads and writes to persistent memory. That should be
> an inclusive flag (along with the API version) indicating that all related operations are supported.
>
>
> > *This flag would indicate to the application that events may be
> > generated to the event queue for consumption by the application.
> Commit events would be
> > generated upon receipt of a commit message from a remote peer,
> and the application
> > would be responsible for handling the event.
> > *Lack of the FI_COMMIT_MANUAL flag, and the presence of the
> > FI_RMA_PMEM (or FI_PMEM) flag in the info structure should imply
> that the hardware is
> > capable of handling the commit requests to persistent memory and
> the application does
> > not need to read the event queue for commit events.
> >
> > *Change of flag definition
> >
> > *The FI_RMA_PMEM flag should be changed to FI_PMEM to indicate
> that the
> > provider is PMEM aware, and supports RMA/AMO/MSG operations to
> and from persistent
> > memory.
> > *There may be little value in supporting messaging interfaces,
> but it is
> > something that could supported.
>
> Support for messaging requires additional definitions. Part of
> the discussion is figuring out the scope of what should be defined in
> the short term. As mentioned above, FI_FENCE | FI_COMMIT_COMPLETE can
> be used to commit message transfers. I can't think of a better
> alternative here. However, I'm not sure if the proposed IBTA and IETF
> specifications will result in hardware capable of supporting the
> FI_FENCE | FI_COMMIT_COMPLETE semantic. :/
>
>
> > *Addition of an event handler registration for handling event
> queue entries within
> > the provider context (See Definition: fi_eq_event_handler)
> >
> > *Essentially, this becomes a registered callback for the target
> application
> > to handle specific event types. We can use this mechanism with
> the target application
> > to allow the provider to handle events internally using a
> function provided by the
> > application. The function would contain the logic necessary to
> handle the event
>
> Callbacks are to be avoided. They present difficult locking
> scenarios with severe restrictions on what the application can do from
> the callback, and present challenging object destruction situations.
> Those restrictions can be difficult for an application to enforce,
> since calls outside the app to other libraries may violate them.
>
>
> > *Specific to PMEM, a function handler would be used by the
> target
> > application to handle commits to persistent memory as they were
> delivered without
> > requiring a fi_eq_read and some form of acknowledgement around
> the commit action. With
> > the handler, the commit could be handled entirely by the
> function provided by the
> > application, and the return code from the application provided
> call-back would be
> > sufficient for a software emulation in the provider to produce
> the return message to
> > the sender that the commit transaction is fully complete. The
> use of a handler allows
> > us to make the commit transaction as light-weight, or
> heavy-weight as necessary.
> >
> > Definitions:
> >
> > fi_commit
> >
> > ssize_t fi_commit(struct fid_ep *ep,
> >
> > const struct fi_rma_iov *iov,
> >
> > size_t count,
> >
> > fi_addr_t dest_addr,
> >
> > uint64_t flags,
> >
> > void *context);
> >
> > fi_eq_commit_entry
> >
> > struct fi_eq_commit_entry {
> >
> > fid_t fid; /* fid associated with request */
> >
> > const struct fi_rma_iov *iov; /* iovec of memory regions to be
> > committed to persistent memory */
> >
> > size_t count; /* number of iovec/key entries */
> >
> > uint64_t flags; /* operation-specific flags */
> >
> > };
> >
> > fi_eq_event_handler
> >
> > typedef ssize_t (*fi_eq_event_handler_t)(struct fid_eq *eq,
> >
> > uint64_t event_type,
> >
> > void *event_data,
> >
> > uint64_t len,
> >
> > void *context);
> >
> > ssize_t fi_eq_register_handler(struct fid_eq *eq,
> >
> > uint64_t event_type,
> >
> > fi_eq_event_handler_t handler,
> >
> > void *context);
> >
> > Use cases supported by this proposal:
> >
> > *As an application writer, I need to commit multiple previously-sent data
> > transfers to the persistence domain
>
> To be clear, the proposal only supports RMA writes, and maybe
> atomics, to the target memory. That is likely sufficient for now, but
> I'd like to ensure that we have a way to extend pmem support beyond
> the limited use cases being discussed.
>
>
> > *Previous functionality allows for a commit for every message as
> is the case
> > for FI_COMMIT_COMPLETE, or the use of FI_COMMIT on a
> per-transaction basis. The need in
> > this use case is performance-oriented, to allow less strict
> delivery model to the NIC
> > for most messages followed up with a 'flush' of the NIC to the
> persistence domain. This
> > allows most messages targeted to the persistence domain to
> complete with a less strict
> > delivery model, and provides a mechanism to ensure that those
> data transfers are
> > eventually persisted.
>
> Unless the app has set FI_COMMIT_COMPLETE as the default
> completion model, it only applies to the operation on which it was
> set. The main gap I'm aware of with proposed specifications is support of a 'flush' type semantic.
>
>
> - Sean
>
>
> _______________________________________________
> ofiwg mailing list
> ofiwg at lists.openfabrics.org
> https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__lists.openfabrics.org_mailman_listinfo_ofiwg&d=DwIGaQ&c=C5b8zRQO1
> miGmBeVZ2LFWg&r=Gu85MpS7ImGmwh9TaJU-
> rXwAoPzObckoDNIQpAj4MDo&m=Qj08qjdDK2mAzsmlUcJSi6FH3QDFiIz5O7BNvH
> aCvTs&s=PbOj9sBPeYA9Giq_DII7GYYyoLXmwpPiOWLwIylEGrQ&e=
>
> _______________________________________________
> ofiwg mailing list
> ofiwg at lists.openfabrics.org
> https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__lists.openfabrics.org_mailman_listinfo_ofiwg&d=DwIGaQ&c=C5b8zRQO1
> miGmBeVZ2LFWg&r=Gu85MpS7ImGmwh9TaJU-
> rXwAoPzObckoDNIQpAj4MDo&m=Qj08qjdDK2mAzsmlUcJSi6FH3QDFiIz5O7BNvH
> aCvTs&s=PbOj9sBPeYA9Giq_DII7GYYyoLXmwpPiOWLwIylEGrQ&e=
> _______________________________________________
> ofiwg mailing list
> ofiwg at lists.openfabrics.org
> https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__lists.openfabrics.org_mailman_listinfo_ofiwg&d=DwIGaQ&c=C5b8zRQO1
> miGmBeVZ2LFWg&r=Gu85MpS7ImGmwh9TaJU-
> rXwAoPzObckoDNIQpAj4MDo&m=Qj08qjdDK2mAzsmlUcJSi6FH3QDFiIz5O7BNvH
> aCvTs&s=PbOj9sBPeYA9Giq_DII7GYYyoLXmwpPiOWLwIylEGrQ&e=
More information about the ofiwg
mailing list