[ofiwg] fault-tolerance

Bland, Wesley wesley.bland at intel.com
Tue Sep 8 10:40:08 PDT 2015


Probably. I wasn't necessarily saying that there's a deficiency. Just pointing out the functionality that would probably be required for what Jeff mentioned. 

> On Sep 8, 2015, at 12:28 PM, Sur, Sayantan <sayantan.sur at intel.com> wrote:
> 
> 
>> 
>> What would be more helpful would be to have OFI provide a well-specified
>> mechanism for reporting communication failures that it can’t
>> automatically resolve. Some sort of error reporting from OFI calls to say
>> that a specific send failed would be nice. From that error code, we can
>> infer which target failed since OFI doesn’t have any collectives which
>> would make this more difficult.
> 
> 
> Errors should be reported to the CQ readerr. That’s what you want, right?
> 
> Thanks,
> Sayantan.
> 
> 
>> 
>> Thanks,
>> Wesley
>> 
>> 
>> 
>> On 9/8/15, 11:57 AM, "ofiwg-bounces at lists.openfabrics.org on behalf of
>> Hefty, Sean" <ofiwg-bounces at lists.openfabrics.org on behalf of
>> sean.hefty at intel.com> wrote:
>> 
>>>> What's the state of fault-tolerance in OFI?  Would it be prudent for
>>>> someone to write OFI code that aspired to survive process failures?
>>>> Are
>>>> any implementations known to support this robustly right now?
>>> 
>>> This would be provider specific.  I'm not aware of anything that's coded
>>> to handle failures.
>>> 
>>> Having an example of this over libfabric would be great, though I'm not
>>> sure who's going to volunteer to write this.
>>> 
>>> It's not clear to me how fault tolerance relates to a networking API.
>>> For example, what specific lower-level features does an app need to make
>>> this happen?  Are their restrictions that providers need to report to
>>> apps regarding their level of support?  Is this something that even
>>> belongs to this level of API?
>>> _______________________________________________
>>> ofiwg mailing list
>>> ofiwg at lists.openfabrics.org
>>> http://lists.openfabrics.org/mailman/listinfo/ofiwg
>> _______________________________________________
>> ofiwg mailing list
>> ofiwg at lists.openfabrics.org
>> http://lists.openfabrics.org/mailman/listinfo/ofiwg
> 



More information about the ofiwg mailing list