[libfabric-users] feature requests
Hefty, Sean
sean.hefty at intel.com
Thu Jun 8 09:26:20 PDT 2017
> > Plus, I think such a concept would be easy to implement, which is my
> real objective. :) The app may just need to deal with huge
> addresses.
>
> I think you're glossing over a bunch of details, though. E.g.:
I meant easy to setup the address list only, not that it made implementing multi-rail easy. :)
> - what happens if one of the underlying endpoints has an error?
I thought about this. Having a primary address scheme means that that address cannot fail prior to exchanging the other addresses. If we have the full set up front, we already have fallback addresses available.
With a primary address, either we have an additional all-to-all address exchange at startup to get the full address list, or we obtain the address list the first time we need to communicate with a peer. The latter option seems more scalable, but also increases the chance of an error occurring.
> - how do credits work?
> - when you fi_send, how is it decided which of the underlying EPs is
> used to send the message?
I'm guessing this will be controlled through an environment variable or a provider specific or control interface to specify those attributes. Maybe there's some way to standardize those variables/attribute across providers, at least for some of the more common options.
> - when you fi_recv, to which underlying [hardware] resource is the
> buffer posted?
The existing utility code has this problem today. It's been solved using software queues that sit over the hardware resources. Maybe tag space reservation could be used here.
> - what ordering guarantees can be provided?
We allow the app to request the ordering guarantees that it needs. For multi-rail to be enabled, it needs to check that it can meet those guarantees. My hope is that we can use the ordering guarantees to figure out when to turn on multi-rail.
- Sean
More information about the Libfabric-users
mailing list