[libfabric-users] problems with FI_MULTI_RECV buffer errors in the sockets provider (and perhaps others?)

Hefty, Sean sean.hefty at intel.com
Fri May 3 08:22:18 PDT 2019


> 1. It seems like an issue should be opened about the problem that the FI_SOURCE value
> is lost when calling fi_cq_readerr, want me to do it?

Sure - I've made a note to myself to try to address this for either the 1.8 or 1.9 release.

> 2. You've mentioned that the sockets provider is deprecated.   Could you update the
> feature matrix web page to say that, so that others don't spend a lot of time debugging
> a provider that won't get fixed?

Yes

> 3. You said:
> 
> > By the time FI_MULTI_RECV is set on a completion, no additional completions will be
> generated for that buffer.
> > At least that is how is should work.  If not, this sounds like a bug in the provider.
> Error completions should be
> > reported in order with non-error completions.
> 
> I agree with you entirely.   Neither sockets nor gni work this way, the FI_MULTI_RECV
> bit for a buffer can be seen by the application before all CQEs and/or errors have been
> seen for that buffer.  And error completions are always returned before any CQE
> completions, out of order.   I will open an issue against gni for this.   I have not
> checked any of the other providers.

Most of the other providers use the utility code for completions, so I think they're okay.

- Sean


More information about the Libfabric-users mailing list