[ofiwg] fi_[ec]q_readerr

Tue Oct 21 14:48:16 PDT 2014

Hmm, thinking about this more, I think the main thing that threw me off was this line from "man fi_eq_read"  : " When an operation that completes with an unexpected error is inserted into an EQ, it is placed  into  a  temporary  error queue."

That may or may not be true for any given provider.

The way of thinking about this that makes my head hurt less is "the next event in line has more state than I can report to you in the space provided, please call *_readerr()" to get the information."  Whether or not this is from "a temporary error queue" or not is totally an implementation detail and definitely not the way I would implement it, but in any event should not be in the man page.

If the next completion in line is an error that needs a bigger struct, I guess it's ok to force the user to use a different call to retrieve the error information.  The relative order between "good" and "bad" completions can still be maintained, because [ec]q_read does not start returning EAVAIL until the error completion percolates to the top.

I think this statement in the man page is incorrect:
"Attempting to read from an EQ while an item is in the error queue results in an FI_EAVAIL failure"
and should instead be:
"Attempting to read from an EQ when the next item in the event queue is an error results in an FI_EAVAIL failure"

Or at the very least should be unspeficied, if some provider really has some need to accelerate error completions to the head of the line.

So, I guess I can live with *_readerr(), but I will open an issue to request some changes to the relevant man pages.
-r

> -----Original Message-----
> From: Hefty, Sean [mailto:sean.hefty at intel.com]
> Sent: Tuesday, October 21, 2014 2:06 PM
> To: Reese Faucette (rfaucett); ofiwg at lists.openfabrics.org
> Subject: RE: fi_[ec]q_readerr
> 
> > The more I think about this whole *_readerr() API, the more it bothers me.
> > In general CQs and EQs simply hold return the delayed results from
> > issuing asynchronous commands or operations.  What's the reason for
> > differentiating "successful" results from "failed" results in such a
> > dramatic way?
> 
> 
> > The man page says "EQs are optimized to report operations which have
> > completed successfully."  This may be true, but I don't see why it
> > necessitates a separate call for retrieving an error completion.  Jeff
> > points out that this semantic of first calling fi_cq_read() and
> > getting - FI_EAVAIL, then calling fi_cq_readerr() to get the error can
> > be cumbersome in a multi-threaded environment.
> >
> >
> >
> > It appears that part of the motivation for this is to keep the size of
> > the buffer the user passes to [ec]q_read as small as possible.
> > Suppose that we add a pointer to an error structure in the completion
> > struct, so the in the error case, the provider malloc()s data for this
> > additional error data and returns a pointer to it.  This should
> > satisfy the goal of keeping the size of the struct the user passes in
> > small, but also allows the provider to return rich error information, all
> through the single [ce]q_read APIs.
> 
> The size of the resulting completion buffer is small, and multiple events can
> still be reported at once.  On the return side, the app only needs to check
> the result of function return to check for success, versus checking both the
> result of the function return code, along with additional checks against each
> event.
> 
> Error events are reported one at a time.
> 
> > [ec]q_read would still return -FI_EAVAIL (or equivalent) for an error
> > completion, but this means "look in your completion for the error
> > information" rather than "make another call"
> 
> The app could ask for 8 completions.  The first 4 could be successful, the
> next 4 fail.  How do you indicate this?
> 
> You don't want error reporting to fail because of a malloc failure, and
> passing allocated memory from one library for another library to free is
> ugly.  The app should really own this buffer.
> 
> > This also allows order to be maintained between "successful" and "failed"
> > completions, which is lost with the out-of-band error reporting.
> 
> IMO - we need to re-think completion order in general.  There seems to be
> an implicit assumption that operations complete in order, which may not
> actually be true in all cases, and could artificially restrict performance
> optimizations.  This is fairly easy to see with something like the address
> vector, but applies to data transfer operations as well.
> 
> I don't have a good solution for multi-threaded apps, but then completion
> order doesn't seem to apply much there anyway.  The current definition
> doesn't seem that bad though.  A bunch of threads may go looking for an
> error event and just not find one.