[ofiwg] RFC on error handling in fi_getinfo call

Jason Gunthorpe jgunthorpe at obsidianresearch.com
Thu Jan 15 15:41:57 PST 2015


On Thu, Jan 15, 2015 at 11:34:42PM +0000, Coulter, Susan K wrote:
> 
> On Jan 15, 2015, at 4:02 PM, "Mccormick, Patrick M" <patrick.m.mccormick at intel.com> wrote:
> 
> > The big con to 2) that Sean brought up in earlier discussions is
> > that:
> > 
> > The sockets provider supports everything, so any error from
> > another provider will result in the application silently choosing
> > the sockets provider and getting poor performance.
> 
> Thank you.  And I guess the operative word here is "silently".  Is
> that the bit that would require an API change?  ( not being silent
> about the sockets choice ? ) If so, this seems like a choice between
> okra and cauliflower.  Blech.  Curious to hear others opinions ...

The issue is broader, if a provider works selectively for some users
and not others, what should happen? (ie maybe verbs needs mlock rights,
but sockets obviosly doesn't)

Should libfabric become unavailable to sockets users just because a
machine is capable of verbs?

I'd say no, so providers can't fail get info.

But, yes, diaging that is nasty. export LIB_FABRIC_DEBUG=1 ?

Jason



More information about the ofiwg mailing list