[ofiwg] detecting FABRIC_DIRECT mismatch
sean.hefty at intel.com
Fri May 13 15:20:32 PDT 2016
> I don't know the details of how direct works, but if you must build a
> special hardware specific version of libfabric.so, then those symbols
> must not overlap with the normal full function library symbols.
Libfabric exports a minimal set of functions. Those are unchanged between the direct and non-direct builds. Typically, a direct build results in a provider replacing static inline calls with their own implementation and exporting additional library symbols. The provider may also replace certain constants and data structures with hardware specific versions. This is how the mismatch is showing up.
For the app developer, they write to a single set of interfaces, and no source changes are required in order to optimize for different providers. However, the app is expected to recompile if any changes are made to the library or provider. Outside of select apps (e.g. benchmarks) running on exascale class machines, I doubt anyone will use FABRIC_DIRECT.
More information about the ofiwg