sean.hefty at intel.com
Tue Oct 21 14:06:22 PDT 2014
> The more I think about this whole *_readerr() API, the more it bothers me.
> In general CQs and EQs simply hold return the delayed results from issuing
> asynchronous commands or operations. What's the reason for
> differentiating "successful" results from "failed" results in such a
> dramatic way?
> The man page says "EQs are optimized to report operations which have
> completed successfully." This may be true, but I don't see why it
> necessitates a separate call for retrieving an error completion. Jeff
> points out that this semantic of first calling fi_cq_read() and getting -
> FI_EAVAIL, then calling fi_cq_readerr() to get the error can be cumbersome
> in a multi-threaded environment.
> It appears that part of the motivation for this is to keep the size of the
> buffer the user passes to [ec]q_read as small as possible. Suppose that
> we add a pointer to an error structure in the completion struct, so the in
> the error case, the provider malloc()s data for this additional error data
> and returns a pointer to it. This should satisfy the goal of keeping the
> size of the struct the user passes in small, but also allows the provider
> to return rich error information, all through the single [ce]q_read APIs.
The size of the resulting completion buffer is small, and multiple events can still be reported at once. On the return side, the app only needs to check the result of function return to check for success, versus checking both the result of the function return code, along with additional checks against each event.
Error events are reported one at a time.
> [ec]q_read would still return -FI_EAVAIL (or equivalent) for an error
> completion, but this means "look in your completion for the error
> information" rather than "make another call"
The app could ask for 8 completions. The first 4 could be successful, the next 4 fail. How do you indicate this?
You don't want error reporting to fail because of a malloc failure, and passing allocated memory from one library for another library to free is ugly. The app should really own this buffer.
> This also allows order to be maintained between "successful" and "failed"
> completions, which is lost with the out-of-band error reporting.
IMO - we need to re-think completion order in general. There seems to be an implicit assumption that operations complete in order, which may not actually be true in all cases, and could artificially restrict performance optimizations. This is fairly easy to see with something like the address vector, but applies to data transfer operations as well.
I don't have a good solution for multi-threaded apps, but then completion order doesn't seem to apply much there anyway. The current definition doesn't seem that bad though. A bunch of threads may go looking for an error event and just not find one.
More information about the ofiwg