[ofiwg] presentation from today's WG meeting

Wed Apr 23 17:04:11 PDT 2014

Doug, thanks for the feedback.

> I agree with most of the function calls, except for the ones that result
> in us posting a recv buffer.  Those calls are named similarly to Sockets
> API calls that a synchronous, blocking calls that return an actual
> result, yet our version would be asynchronous and would not actually
> return a result but would just queue up a receive buffer.  I think they
> should use a different name to avoid this semantic conflict with
> functions by the same name structure in the Sockets API.

Any ideas?

> The other thing I want to request is that we plan for making this
> library function either as a single threaded, main loop driven
> application (which the code you've posted seems well suited for), or as
> a threaded, event driven library (which there currently is no definition
> for).  I'm fine if we say that the application must do one or the other
> and not both, or if we have strict rules about how the methods can be
> mixed, but in the event model I specifically imagine that A) the library
> will have its own threads that do certain tasks and B) the library will
> allow the application to register callbacks and instead of the
> application polling the library for received data or send completions,
> we can use those callbacks to notify the application instead (and in
> this case, let's say the application registers a callback for send
> completions, but not for recv completions, then the send path in the
> library would use the callback method while the application would be
> responsible for polling for recv completions).  Exactly which options
> being enabled result in library generated threads would have to be
> explicitly spelled out so that applications that don't want the library
> to generate threads on its own can avoid making it do so.
> 
> So, I'd like to see an additional section added to the API that is this
> event driven section.  While the below is not what I would use in
> libfabric (it's cribbed from a different project that doesn't represent
> the same generic nature that libfabric needs to represent), it kind of
> gets my point across:
> 
> /*
>  * We use these functions to call back into the client/server to notify
>  * them of async activity on specific connections, the server or client
>  * sets up the global cb struct with their functions prior to starting up
>  * the transport.
>  */
> 
> struct callback_ops {
>         void (*read_callback)(struct transport_data *t_data,
>                               struct buffer *buff);
>         void (*write_failed_callback)(struct transport_data *t_data,
>                                       struct buffer *buff);
>         void (*accept_callback)(struct transport_data *t_data);
>         void (*eof_callback)(struct transport_data *t_data);
>         void (*disconnect_callback)(struct transport_data *t_data);
>         void (*closed_callback)(struct transport_data *t_data);
>         void (*buffers_empty_callback)();
>         void (*idle_callback)(struct transport_data *t_data);
> };
> 
> Something along these lines can be done to implement an event driven API
> as part of libfabric.

I did consider a callback completion model, specifically because it may be more efficient for providers that require host based processing, but I didn't add it to the API.  Part of this discussion seems to include how a provider makes progress.

One way to handle this would be to define a new type of event queue/collector that specifically reported events via callbacks.  The app would fill out a set of callback ops, similar to what you described.  In the simplest case, there could be send, receive, connect, etc. callbacks.  A more advanced (i.e. generic) usage model might involve active messages.  Application input here would be useful.  It needs to be clear what an app can and cannot do from the callback, and from what thread context the callback occurs.

> Along these same lines, if the library has options that cause it to go
> threaded, it also needs to think about things like thread priorities and
> real time kernels.  How threads get their priorities set both when we do
> and don't have real time capabilities would need to be spelled out.

Good point.  I made up a very limited interface that an app can use to determine how the underlying provider makes progress on data transfers.  There needs to be in discussion around this area, but I did not want to assume that progress was automatically handled by hardware, or that upper software wanted to have their performance impacted by provider threads.  As it stands now, a control interface can be used to change how the provider makes progress, but it does not consider thread priority or real time kernels.  I wonder if that's something that can be handled above the interface.

> > For a merged effort, I would anticipate that in some cases the same
> > set of function pointers could be usable between gen1 and gen2 APIs
> > (e.g. msg_ops, rma_ops), but with differently named wrapper functions
> > (e.g. fi_write versus ibv_write).  An example of this was in last
> > week's presentation.  In other cases, functions may not easily apply
> > (e.g. tagged_ops) or only the concepts may be transferable (e.g.
> > optimized poll CQ call).  The CM functionality and their full
> > integration would be an example of calls that evolve from gen 1 to a
> > gen 2.
> 
> I like the idea of following the upstream libibverbs and librdmacm git
> repos for as long as we can, but I suspect there will come a time when
> we might need to fork.  I say this because a tighter integration between
> the v1 and v2 apis such that something like a protection domain can be
> shared between them might require tricks in terms of library linkage
> that result in us being forced to roll all three libraries into one
> monolithic library and having shim libraries installed in place of
> libibverbs and librdmacm that do the job of getting older applications
> linked against the right symbols in libfabric in order to work without a
> recompile or any other changes.  If that happens, I don't think we'll be
> able to follow upstream any more (or at a minimum we would have to
> follow it using rebase=true on the upstream master branch and having a
> possibly large patch set on top of upstream that we would have to
> constantly maintain).
> 
> Another thing I didn't see addressed in the current API document is the
> issue of extensions.  I think we need to address this from the beginning
> as there will always be custom hardware extensions that applications
> want to make use of.  I don't really like the current libibverbs
> extension method as it requires magic numbers.  I think what I'd rather
> see is a query, response, registration mechanism for extensions where
> the query itself is based upon a specific device, a string name of the
> extension, and an API version.  The response would then query the driver
> for the specific device and either accept the requested extension name
> and binary version and provides the required set of function pointers,
> or returns a negative response.  The application would then save off the
> necessary function pointers for use during run time, and then any
> mandatory or optional registration device needs to initialize the
> extension on the specific device would be called by the application
> (which might result in the low level driver making changes to how the
> library handles non-extension calls that are effected by the extension
> being enabled).  At that point the extension would be functional.  This
> avoids the problem with having an enum for extensions, and even makes it
> possible for vendor supplied extension and official upstream extensions
> to be different.  For instance, if Mellanox wanted to have an extension
> MLX_CORE_DIRECT that was different than the final CORE_DIRECT extension
> we put into the upstream libfabric, they could (that users might have to
> code to two different extensions is a problem that the users and the
> vendor get to deal with...users often request, and vendors sometimes
> deliver, features long before they land upstream...given that this
> practice has not abated over the years, I'm tired of fighting against
> it, and this resolves the conflicts that arise when a vendor ships an
> early version of a feature in their own code simply by allowing them to
> use a vendor specific name, and leaves the vendor and their users with
> the onus to deal with the code changes that can happen when they code to
> a feature before it is approved by upstream).

There is a call associated with the fabric object (if_open) that is intended to be used to open some interface by name.  Maybe this operation needs to be associated with more objects?  An application can use this to open any interface that a provider may export.  The provider would be responsible for shipping any header file that the app might use.

This same call also allows the framework to export calls to the providers to make use of, which could be useful for helper functions, such as a memory registration cache.  The details of the API can be worked out.

- Sean