[ofiwg] definition of a completion in OFI
Hefty, Sean
sean.hefty at intel.com
Thu Mar 5 10:37:52 PST 2015
> It sounds to me like the app is wrong.
Possibly. That's what I'm trying to figure out. I'm open for someone to tell me what the app should do and why, so that it can be documented.
> Ultimately, you can fix it for this relatively narrow case in the
> shutdown/close, but other cases won't be fixed by that approach. E.g.
> what if somebody sent you a message and requested remote completion over
> that sockets provider? What if you exit after the data has reached the
> target kernel buffers, but before you read from the socket (and,
> therefore, before you ack it at the provider level), and the initiator is
> blocking waiting for that ack?
The app works fine over iWarp and IB hardware. My guess is that this is because both do _something_ beyond what is specified by OFI, IB verbs, and iWarp verbs. I'm not even sure that they're doing the same thing to make things work. I don't have the ability to test usnic or cray implementations.
For the IB hardware that I have, I don't believe requests are completed until an ack has been received. If this is a requirement from the IB spec, I haven't been able to find it. But the end result is the tests prints success at the end, which makes me happy. No complex shutdown protocol, which is defined by the spec, ends up being used.
I don't know if iWarp is doing the same thing, or if iWarp benefits from having an end of stream marker.
The current requirements of the relevant APIs used by the app are:
Completion - local buffer may be reused
Shutdown - for connected EPs only
Will notify the remote side of a disconnect.
The notification may be in or out of band, app doesn't know which.
Close - releases all resources
An FI_REMOTE_COMPLETE completion option was defined based on gathered requirements. Support for this is optional. I don't know how this can map to IB or iWarp without knowing details about the hardware implementations. Otherwise, it seems that a final sync message exchanged between the apps using FI_REMOTE_COMLPETE would work. For the connected case, a shutdown message exchange should be usable as well. But that still leaves the unconnected EP.
On a related note, I can get the sockets provider to pass these tests by disabling internal queueing that it does. This doesn't require changes to the tests, but it doesn't mean that it's officially correct.
- Sean
More information about the ofiwg
mailing list