[ofiwg] presentation from today's WG meeting

Jeff Squyres (jsquyres) jsquyres at cisco.com
Thu Apr 24 03:33:14 PDT 2014


On Apr 22, 2014, at 11:22 AM, Doug Ledford <dledford at redhat.com> wrote:

> I like the idea of following the upstream libibverbs and librdmacm git
> repos for as long as we can, but I suspect there will come a time when
> we might need to fork.  I say this because a tighter integration between
> the v1 and v2 apis such that something like a protection domain can be
> shared between them might require tricks in terms of library linkage
> that result in us being forced to roll all three libraries into one
> monolithic library and having shim libraries installed in place of
> libibverbs and librdmacm that do the job of getting older applications
> linked against the right symbols in libfabric in order to work without a
> recompile or any other changes.  If that happens, I don't think we'll be
> able to follow upstream any more (or at a minimum we would have to
> follow it using rebase=true on the upstream master branch and having a
> possibly large patch set on top of upstream that we would have to
> constantly maintain).

If I understand what Doug is saying properly, +1.

Keep in mind that there are vendors who have little/no interest in either developing or maintaining verbs support: libfabric is the way forward.  My point -- whatever solution is used show:

1. allow vendors who care about verbs to (try to?) share code between their verbs and libfabric providers
2. not require vendors who do not care about verbs to develop/maintain verbs providers

I think there's been a lot of discussion and ideas floated around about #1, but I think #2 is also quite important (for self-serving reasons, obviously).

That is, for those of us who do not have IB hardware, we don't want to be chained to IB-specific abstractions from verbs.  libfabric is the way forward; we'd rather be able to create a 100% libfabric-based solution that is not tethered to IB-centric abstractions from verbs.

> Another thing I didn't see addressed in the current API document is the
> issue of extensions.  

+1

> I think we need to address this from the beginning
> as there will always be custom hardware extensions that applications
> want to make use of.  I don't really like the current libibverbs
> extension method as it requires magic numbers.  I think what I'd rather
> see is a query, response, registration mechanism for extensions where
> the query itself is based upon a specific device, a string name of the
> extension, and an API version.  The response would then query the driver
> for the specific device and either accept the requested extension name
> and binary version and provides the required set of function pointers,
> or returns a negative response.  The application would then save off the
> necessary function pointers for use during run time, and then any
> mandatory or optional registration device needs to initialize the
> extension on the specific device would be called by the application
> (which might result in the low level driver making changes to how the
> library handles non-extension calls that are effected by the extension
> being enabled).  At that point the extension would be functional.  This
> avoids the problem with having an enum for extensions, and even makes it
> possible for vendor supplied extension and official upstream extensions
> to be different.  For instance, if Mellanox wanted to have an extension
> MLX_CORE_DIRECT that was different than the final CORE_DIRECT extension
> we put into the upstream libfabric, they could (that users might have to
> code to two different extensions is a problem that the users and the
> vendor get to deal with...users often request, and vendors sometimes
> deliver, features long before they land upstream...given that this
> practice has not abated over the years, I'm tired of fighting against
> it, and this resolves the conflicts that arise when a vendor ships an
> early version of a feature in their own code simply by allowing them to
> use a vendor specific name, and leaves the vendor and their users with
> the onus to deal with the code changes that can happen when they code to
> a feature before it is approved by upstream).

In the MPI feedback slides, we talked about the requirement for vendor-provided extensions as well, for both the use case you mentioned, and others (e.g., vendor has hardware that does things that aren't covered by upstream concepts / functionality at all).

I'm still grokking your specific proposal here, but string-based lookups is generally a fine idea (a la a slightly smarter/more-featureful dlopen).

-- 
Jeff Squyres
jsquyres at cisco.com
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/




More information about the ofiwg mailing list