[ofiwg] selecting a fabric for a destination
sean.hefty at intel.com
Wed Oct 8 17:18:15 PDT 2014
> "best" is too vague below. To limit the scope of this, let's specifically
> call it "shortest distance" to make it clear we're not looking at speed,
> congestion, etc. Simply an abstracted "network hops".
> To put it in real terms - you have two interfaces, the remote is on the
> same subnet as one, but could also be reached through a router from the
> other. You want to pick the local one, and you don't want the application
> having to parse kernel routing tables to find out which one to use, or
> even know that kernel routing tables are involved in the answer. How does
> it pick the right one?
I think fi_getinfo is the correct answer here. It's not enough to indicate what fabric or provider is the shortest distance to some destination. That builds in assumptions about what protocol the app wants to use, what endpoint capabilities are of interest, etc. which is what fi_getinfo provides. Fi_getinfo should be called with hints that allow the provider to refine its returned list.
There's an assumption being made here that fi_getinfo will be expensive, and that we need a solution for that, without actually knowing if that will be the case or not. If fi_getinfo does prove to be too expensive, the interface can still be used as you describe with additional flags to limit the operations performed by the provider.
I have a similar feature built into the librdmacm rdma_getaddrinfo call, which allows me to avoid/perform IB path record queries.
More information about the ofiwg