[ofiwg] A question on FI_DELIVERY_COMPLETE

Paul Grun grun at cray.com
Wed Oct 21 11:09:31 PDT 2015

For SHMEM - In an environment where the ordering of placement of data in memory cannot necessarily be guaranteed, it seems like the SHMEM method of polling on a memory location is a distinctly bad idea, even though it's true that we've gotten away with it for years, largely thanks to rational memory controller designers and the restrictions of the PCIe bus.

For MPI, I can see how that would work, but it amounts to doing an RMA operation followed by a SEND.  The mechanism I am proposing would eliminate the need for the SEND operation.

Essentially, the requirement is as follows:  Ensure consistency of data availability at the responder for one-sided operations to ensure that data is actually available when the responder side consumer goes to look for it.

Accomplishing this requires two things:
1. a signaling mechanism from the requester to the responder, and
2. a mechanism at the responder side to synchronize the ordering of the signal w.r.t. data visibility.

We already have REMOTE_CQ_DATA to use as the signal, defining FI_DELIVERY_COMPLETE as suggested could be the needed synchronization mechanism.

I do think that such a feature could be used to satisfy the talk in the MPI Forum, assuming it ever comes to anything.


-----Original Message-----
From: Sur, Sayantan [mailto:sayantan.sur at intel.com] 
Sent: Wednesday, October 21, 2015 10:05 AM
To: Paul Grun; ofiwg at lists.openfabrics.org
Subject: Re: [ofiwg] A question on FI_DELIVERY_COMPLETE

SHMEM: there is a wait_until call that the responder calls that basically waits until the value in the memory location changes. Then there is shmem_barrier_all, in which the responder could also have participated.

MPI: In passive mode operation - the requestor (origin) needs to unlock a window (or flush), and send a message to the responder (target) that allows the target to inspect the data.

There is some talk in MPI Forum to introduce a call that does MPI_Put with notify, but it doesn’t exist currently. If it was introduced, maybe it could use the feature you’re suggesting?


On 10/21/15, 9:51 AM, "Paul Grun" <grun at cray.com> wrote:

>What are those mechanisms that MPI and SHMEM use?
>Wouldn't it be useful if the requester could simply use REMOTE_CQ_DATA and be assured that the responder wouldn't get the completion until the data had been placed into cache?
>-----Original Message-----
>From: Sur, Sayantan [mailto:sayantan.sur at intel.com] 
>Sent: Wednesday, October 21, 2015 9:50 AM
>To: Paul Grun; ofiwg at lists.openfabrics.org
>Subject: Re: [ofiwg] A question on FI_DELIVERY_COMPLETE
>Having the notification at the requester is useful for MPI RMA or SHMEM use cases. This allows MPI/SHMEM to wait for a local event that indicates remote completion. The responder side is passive in these use cases.
>Both MPI and SHMEM have different mechanisms to let the responder know when it is able to look at the data.
>From: <ofiwg-bounces at lists.openfabrics.org<mailto:ofiwg-bounces at lists.openfabrics.org>> on behalf of Paul Grun <grun at cray.com<mailto:grun at cray.com>>
>Date: Wednesday, October 21, 2015 at 9:33 AM
>To: "ofiwg at lists.openfabrics.org<mailto:ofiwg at lists.openfabrics.org>" <ofiwg at lists.openfabrics.org<mailto:ofiwg at lists.openfabrics.org>>
>Subject: [ofiwg] A question on FI_DELIVERY_COMPLETE
>Here’s my understanding of how FI_DELIVERY_COMPLETE works on the *responder* end:  If you are doing an RMA operation, and the requester uses CQ_REMOTE_DATA to signal the end of the transfer to the responder, and the responder has FI_DELIVERY_COMPLETE set, then the responder won’t get a completion event until the data is actually visible to the responder.
>I ask because the man pages imply that FI_DELIVERY_COMPLETE, which is an operation flag, applies only to the requester side.  But it is much less important to notify the requester that data is visible to the responder, than it is to notify the responder itself.
>Cray Inc.
>Office:    (503) 620-8757
>Mobile:  (503) 703-5382

More information about the ofiwg mailing list