[ofiwg] detecting FABRIC_DIRECT mismatch

Jeff Squyres (jsquyres) jsquyres at cisco.com
Fri May 13 15:36:02 PDT 2016


How about renaming the output library:

- if DIRECT is enabled: libfabric_direct.*
- if DIRECT is not enabled: libfabric.*

Then there's no symbol collisions, and it's obvious to applications which one they should link against.


> On May 13, 2016, at 6:20 PM, Hefty, Sean <sean.hefty at intel.com> wrote:
> 
>> I don't know the details of how direct works, but if you must build a
>> special hardware specific version of libfabric.so, then those symbols
>> must not overlap with the normal full function library symbols.
> 
> Libfabric exports a minimal set of functions.  Those are unchanged between the direct and non-direct builds.  Typically, a direct build results in a provider replacing static inline calls with their own implementation and exporting additional library symbols.  The provider may also replace certain constants and data structures with hardware specific versions.  This is how the mismatch is showing up.
> 
> For the app developer, they write to a single set of interfaces, and no source changes are required in order to optimize for different providers.  However, the app is expected to recompile if any changes are made to the library or provider.  Outside of select apps (e.g. benchmarks) running on exascale class machines, I doubt anyone will use FABRIC_DIRECT.
> _______________________________________________
> ofiwg mailing list
> ofiwg at lists.openfabrics.org
> http://lists.openfabrics.org/mailman/listinfo/ofiwg


-- 
Jeff Squyres
jsquyres at cisco.com
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/




More information about the ofiwg mailing list