[ofiwg] presentation from today's WG meeting

Doug Ledford dledford at redhat.com
Tue Apr 22 08:22:10 PDT 2014

On 4/15/2014 1:22 PM, Hefty, Sean wrote:
> Attached is the presentation that I went over today.  Next week's
> meeting will solicit responses from anyone regarding the *direction*
> of the proposal.

I won't make today's meeting (conflicting meetings).  So, I'll provide
my feedback here.

> As mentioned in today's meeting, the man page formats for the current
> APIs can be found in the source code or online here:
> https://www.openfabrics.org/downloads/OFIWG/API/

These provide some clarity that the presentation didn't.  Thanks for the

> This is slightly older documentation and describes the static inline
> wrapper functions around the operations discussed in the attached
> slides.  The *details* of the specific function calls is what would
> need discussion within the working group and the application
> developers.

I agree with most of the function calls, except for the ones that result
in us posting a recv buffer.  Those calls are named similarly to Sockets
API calls that a synchronous, blocking calls that return an actual
result, yet our version would be asynchronous and would not actually
return a result but would just queue up a receive buffer.  I think they
should use a different name to avoid this semantic conflict with
functions by the same name structure in the Sockets API.

The other thing I want to request is that we plan for making this
library function either as a single threaded, main loop driven
application (which the code you've posted seems well suited for), or as
a threaded, event driven library (which there currently is no definition
for).  I'm fine if we say that the application must do one or the other
and not both, or if we have strict rules about how the methods can be
mixed, but in the event model I specifically imagine that A) the library
will have its own threads that do certain tasks and B) the library will
allow the application to register callbacks and instead of the
application polling the library for received data or send completions,
we can use those callbacks to notify the application instead (and in
this case, let's say the application registers a callback for send
completions, but not for recv completions, then the send path in the
library would use the callback method while the application would be
responsible for polling for recv completions).  Exactly which options
being enabled result in library generated threads would have to be
explicitly spelled out so that applications that don't want the library
to generate threads on its own can avoid making it do so.

So, I'd like to see an additional section added to the API that is this
event driven section.  While the below is not what I would use in
libfabric (it's cribbed from a different project that doesn't represent
the same generic nature that libfabric needs to represent), it kind of
gets my point across:

 * We use these functions to call back into the client/server to notify
 * them of async activity on specific connections, the server or client
 * sets up the global cb struct with their functions prior to starting up
 * the transport.

struct callback_ops {
        void (*read_callback)(struct transport_data *t_data,
                              struct buffer *buff);
        void (*write_failed_callback)(struct transport_data *t_data,
                                      struct buffer *buff);
        void (*accept_callback)(struct transport_data *t_data);
        void (*eof_callback)(struct transport_data *t_data);
        void (*disconnect_callback)(struct transport_data *t_data);
        void (*closed_callback)(struct transport_data *t_data);
        void (*buffers_empty_callback)();
        void (*idle_callback)(struct transport_data *t_data);

Something along these lines can be done to implement an event driven API
as part of libfabric.

Along these same lines, if the library has options that cause it to go
threaded, it also needs to think about things like thread priorities and
real time kernels.  How threads get their priorities set both when we do
and don't have real time capabilities would need to be spelled out.

> For a merged effort, I would anticipate that in some cases the same
> set of function pointers could be usable between gen1 and gen2 APIs
> (e.g. msg_ops, rma_ops), but with differently named wrapper functions
> (e.g. fi_write versus ibv_write).  An example of this was in last
> week's presentation.  In other cases, functions may not easily apply
> (e.g. tagged_ops) or only the concepts may be transferable (e.g.
> optimized poll CQ call).  The CM functionality and their full
> integration would be an example of calls that evolve from gen 1 to a
> gen 2.

I like the idea of following the upstream libibverbs and librdmacm git
repos for as long as we can, but I suspect there will come a time when
we might need to fork.  I say this because a tighter integration between
the v1 and v2 apis such that something like a protection domain can be
shared between them might require tricks in terms of library linkage
that result in us being forced to roll all three libraries into one
monolithic library and having shim libraries installed in place of
libibverbs and librdmacm that do the job of getting older applications
linked against the right symbols in libfabric in order to work without a
recompile or any other changes.  If that happens, I don't think we'll be
able to follow upstream any more (or at a minimum we would have to
follow it using rebase=true on the upstream master branch and having a
possibly large patch set on top of upstream that we would have to
constantly maintain).

Another thing I didn't see addressed in the current API document is the
issue of extensions.  I think we need to address this from the beginning
as there will always be custom hardware extensions that applications
want to make use of.  I don't really like the current libibverbs
extension method as it requires magic numbers.  I think what I'd rather
see is a query, response, registration mechanism for extensions where
the query itself is based upon a specific device, a string name of the
extension, and an API version.  The response would then query the driver
for the specific device and either accept the requested extension name
and binary version and provides the required set of function pointers,
or returns a negative response.  The application would then save off the
necessary function pointers for use during run time, and then any
mandatory or optional registration device needs to initialize the
extension on the specific device would be called by the application
(which might result in the low level driver making changes to how the
library handles non-extension calls that are effected by the extension
being enabled).  At that point the extension would be functional.  This
avoids the problem with having an enum for extensions, and even makes it
possible for vendor supplied extension and official upstream extensions
to be different.  For instance, if Mellanox wanted to have an extension
MLX_CORE_DIRECT that was different than the final CORE_DIRECT extension
we put into the upstream libfabric, they could (that users might have to
code to two different extensions is a problem that the users and the
vendor get to deal with...users often request, and vendors sometimes
deliver, features long before they land upstream...given that this
practice has not abated over the years, I'm tired of fighting against
it, and this resolves the conflicts that arise when a vendor ships an
early version of a feature in their own code simply by allowing them to
use a vendor specific name, and leaves the vendor and their users with
the onus to deal with the code changes that can happen when they code to
a feature before it is approved by upstream).

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 899 bytes
Desc: OpenPGP digital signature
URL: <http://lists.openfabrics.org/pipermail/ofiwg/attachments/20140422/c7ba6cd6/attachment.sig>

More information about the ofiwg mailing list