[ofiwg] completion flags as actually defined by OFI

Weiny, Ira ira.weiny at intel.com
Tue Apr 14 15:48:54 PDT 2015


> 
> My suggestion for your man page (and resulting behavior) is:
> 
> *FI_COMPLETION*
> : Indicates that a completion entry should be generated for data
>   transfer operations.
> 
> *FI_INJECT_COMPLETE*
> : Indicates that a completion should be generated when the
>   source buffer(s) may be reused.  FI_INJECT_COMPLETE guarentees that
>   the buffers will not be read from again and the application may
>   reclaim them.
> 
>   Any of local failure, fabric failure, or peer local failure can
>   prevent the delivery of the peer's completion.
> 
>   [ Mandatory, all must support this ]
> 
> *FI_DELIVERY_COMPLETE*
> : For reliable:
> 
>   Indicates that a completion should be generated when the work
>   request is delivered to the peer. FI_DELIVERY_COMPLETE guarentees
>   that the delivery of the peer's completion is no longer dependent on
>   the fabric or any local resources.
> 
>   A peer local failure can prevent the delivery of the peer's
>   completion.
> 
>   For unreliable:
> 
>   Indicates that a completion should be generated when the work
>   request is delivered to the fabric and is no longer dependent on any
>   local resources. No peer completion is guarenteed.
> 
>   A fabric failure, or peer local failure can prevent the delivery of
>   the peer's completion.
> 
>   [IB does this 99% today, presumably sockets/etc are 100%, iWarp does
>    not support this]

All of the above makes sense to me.  

> 
> *FI_COMMIT_COMPLETE*
> : Indicates that a completion should not be generated until the
>   completion has been delivered to the peer, consumed by the
>   application and acknowledged to be complete.
>   [this needs more language, what api does the application use to
>    signal it completed the work?]

I don't think you should say "consumed by the application".  Rather we should use your language above that "no fabric or peer local failures will prevent delivery."

IE the data has been delivered to the data buffer specified.  The peer provider has generated a completion to the peer application and all peer provider resources for that transaction have been freed.  The application need not signal any specific ack and may not even process the peer completion due to its own issues.

I guess this is where one could define yet another completion FI_APP_COMPLETE (? Just making up names) which requires the applications involvement like you describe above.  In this model, communicating something like data being written to "persistent storage" (or some other high level completion) is signaled by the applications ack back to the peer provider and then back to the local provider/application.  However, I want to stress that this type of completion is under application control and libfabric is basically conveying another "message" for the application.  What that completion means is irrelevant to libfabric.

FWIW, I'm not sure if this is a good idea or not.  IMO it muddies the water between libfabric and the application.

> 
> I've choosen language that talks specifically about the peer completion - since
> this is what a high level app writer cares about.

I like the use of "local" and "peer" providers in the language.

Ira




More information about the ofiwg mailing list