[ofiwg] RFC on error handling in fi_getinfo call

Coulter, Susan K skc at lanl.gov
Thu Jan 15 14:26:00 PST 2015


On Jan 15, 2015, at 2:49 PM, "Hefty, Sean" <sean.hefty at intel.com>
 wrote:

> OFI has an fi_getinfo call, which is similar to rdma_getaddrinfo and getaddrinfo.  It's used to query which endpoints are supported by the underlying providers.  There's been discussion on github threads on how the call should behave in the presence of errors.  Without changing the API, there are 2 basic choices.
> 
> 1. If any provider fails unexpectedly (i.e. any error other than ENODATA), the entire call fails.  The error is returned to the application.
> 
> 2. If a provider fails, the failure is skipped.  Any attributes gathered from other providers are returned.
> 
> There are pros/cons to both, and wider community feedback is needed

I would prefer #2 - ( admitting that I am not fully aware of all the pros/cons) - otherwise, one provider's bug can bring down the whole shebang.

> 
> - Sean 
> _______________________________________________
> ofiwg mailing list
> ofiwg at lists.openfabrics.org
> http://lists.openfabrics.org/mailman/listinfo/ofiwg

====================================

Susan Coulter
HPC-3 Network/Infrastructure
505-667-8425
Increase the Peace...
An eye for an eye leaves the whole world blind
====================================




More information about the ofiwg mailing list