[ofiwg] [RFC] libfabric provider interoperability
sean.hefty at intel.com
Fri Dec 5 22:45:42 PST 2014
> I think a decent solution to the interoperability problem would be as
> follows: providers which claim to implement the same *existing* wire
> protocol must interoperate for all calls that are *natively* supported
> by the wire protocol. In fi_endpoint(3) man page under the description
> of a wire protocol, we should list what the natively supported features
> are considered to be to avoid ambiguity. The two emphasized words add
> meaning that may not have been explicitly mentioned during the last
> OFIWG call.
I like the idea of explicitly listing what features are supported by the various protocols. I would just document that in protocol specific man pages.
> So, as an example, for InfiniBand RC, this would require that two
> providers implement the three atomic operations that are defined in the
> InfiniBand specification using the wire protocol's native support, but
> for all of the other atomics that libfabric provides, all bets are
> off---they might work within a single provider but not between providers.
It's worth noting that there is not a requirement for a provider to implement all defined operations. Specifically, for atomic ops, a given provider may not implement anything other than the base atomics.
I would also point out that the protocol isn't limited to just the data transfer APIs. The connection protocol and address resolution mechanism must also be compatible.
> If two vendors wished to implement the rest of the atomics in InfiniBand
> in an interoperable way, then they would declare a new extended wire
> protocol as we discussed in the call. The providers would still be
> responsible for interoperability of the calls that are natively
> supported by InfiniBand. This would require that we define a way to
> state that a protocol is merely an extension of an existing wire
> protocol. Perhaps we could have some place in libfabric or on the
> website to document these "agreements between friends" wire protocols.
> And/or these ultimately may be taken to, e.g., the IBTA, if it was
> decided that a real standard solution was needed.
One of the difficulties with extending a base protocol is doing so in such a way that the native operations are no longer affected. For example, adding support for other atomic operations would require changes to the messaging protocol, which would affect native support.
More information about the ofiwg