[libfabric-users] problems with FI_MULTI_RECV buffer errors in the sockets provider (and perhaps others?)
sean.hefty at intel.com
Fri May 3 08:22:18 PDT 2019
> 1. It seems like an issue should be opened about the problem that the FI_SOURCE value
> is lost when calling fi_cq_readerr, want me to do it?
Sure - I've made a note to myself to try to address this for either the 1.8 or 1.9 release.
> 2. You've mentioned that the sockets provider is deprecated. Could you update the
> feature matrix web page to say that, so that others don't spend a lot of time debugging
> a provider that won't get fixed?
> 3. You said:
> > By the time FI_MULTI_RECV is set on a completion, no additional completions will be
> generated for that buffer.
> > At least that is how is should work. If not, this sounds like a bug in the provider.
> Error completions should be
> > reported in order with non-error completions.
> I agree with you entirely. Neither sockets nor gni work this way, the FI_MULTI_RECV
> bit for a buffer can be seen by the application before all CQEs and/or errors have been
> seen for that buffer. And error completions are always returned before any CQE
> completions, out of order. I will open an issue against gni for this. I have not
> checked any of the other providers.
Most of the other providers use the utility code for completions, so I think they're okay.
More information about the Libfabric-users