[ofiwg] definition of a completion in OFI

Underwood, Keith D keith.d.underwood at intel.com
Wed Mar 4 23:17:45 PST 2015

> The socket provider doesn't maintain a 1:1 pairing between an OFI endpoint
> and a TCP socket, even in the connected case.  As a result the test actually
> hangs waiting to read a completion for a posted receive.  You are correct, the
> problem is seen at the target.  The cause is that the initiator thinks it's done
> and exits, leaking random bits into the abyss.
> This isn't some enterprise worthy application that handles failover or
> abnormal termination.  The app simply sends a message from the client to
> the server and back.  That's it.  And under normal circumstances, with
> normal exit behavior, it doesn't work.  This fails for both the reliable-
> connected and reliable-unconnected pingpong tests.
> If the apps need to do something different, then we should at least define
> what that should be.  If what the tests are doing is fine, then the provider
> needs to do something different.  And that behavior should likewise be
> captured somewhere.  Maybe changes are needed in both, as Jason is
> implying.
> OFI defines both local and remote completion concepts.  At this point, I think
> everyone is in agreement that this is a problem in the shutdown/close
> semantics and implementation, and not the completion semantics.

It sounds to me like the app is wrong.

The app shouldn't exit with "local completion only" if the peer is blocking for a message and will never terminate if the message doesn't get there.  Local completion is *meant* to let stuff go into buffers that may or may not complete remotely.  

And, if the target cares about surviving the loss of its peer, then the target half of the app should have a time-out path in the wait/recvfrom/whatever call you are using.  

Why would you work hard to fix this problem for the "clean exit" case when it won't be fixed for the "unclean exit" case?  

Ultimately, you can fix it for this relatively narrow case in the shutdown/close, but other cases won't be fixed by that approach.  E.g. what if somebody sent you a message and requested remote completion over that sockets provider?  What if you exit after the data has reached the target kernel buffers, but before you read from the socket (and, therefore, before you ack it at the provider level), and the initiator is blocking waiting for that ack?

More information about the ofiwg mailing list