[Openframeworkwg] Some concerns about the new Fabric/IBverbs API

Christoph Lameter christoph at lameter.com
Wed Dec 4 16:39:07 PST 2013


On Wed, Dec 4, 2013 at 11:53 AM, Hefty, Sean <sean.hefty at intel.com> wrote:

> > That kind functionality is already provided by the shared library systems
> > in Linux. Could not be each of the modules that we talked about just be
> > mapped to a shared library and define the API simply that way? There are
> > mechanisms in place to deal with compatibility issues.
>
> These APIs are usable together, not stand-alone.  The calls used to report
> events are used with the data transfer calls.  And different data transfer
> calls work with the same endpoint (e.g. message send and RDMA write).


The same can be said done using libraries if you create the proper
dependencies.



> > What are the specific requirements that prevent us from using the
> standard
> > library approach? If at all possible we should simply use what is there.
>
> Can you provide more specific details on how you see this working?  I'm
> not following what sort of changes or approach you're thinking about.
>

I am not clear on why we have this complicated API with the extension when
the linker etc already provide such a framework with associated tools to
manage them. There is a history for example of adding/modifying functions
to glibc as well to the kernel. Do we really cannot use the established
means of defining user space APIs?



> This is what we have with libibverbs today:
>
> libibverbs exports a set of function calls, such as
>
> struct ibv_pd *ibv_alloc_pd(struct ibv_context *context);
> struct ibv_mr *ibv_reg_mr(struct ibv_pd *pd, void *addr, size_t length,
> int access);
> struct ibv_cq *ibv_create_cq(struct ibv_context *context, int cqe,
>      void *cq_context, ibv_comp_channel *channel, comp_vector);
> struct ibv_qp *ibv_create_qp(struct ibv_pd *pd, ibv_qp_init_attr
> *qp_init_attr);
> int ibv_modify_qp(struct ibv_qp *qp, struct ibv_qp_attr *attr, attr_mask);
> ...
>
> libibverbs itself can't implement these calls, since the implementation is
> dependent on the provider.  So the provider plugs a set of function calls
> into libibverbs.
>

Isnt the loader used to load an provider implementation that can then be
used by ibverbs? Can these not be simple functions provided by an provider
provided library instead of a vector of functions? dlopen and friends can
manage that.



> struct ibv_context_ops {
>         int                     (*query_device)(struct ibv_context
> *context,
>                                               struct ibv_device_attr
> *device_attr);
>         int                     (*query_port)(struct ibv_context *context,
> uint8_t port_num,
>                                               struct ibv_port_attr
> *port_attr);
>         struct ibv_pd *         (*alloc_pd)(struct ibv_context *context);
>         int                     (*dealloc_pd)(struct ibv_pd *pd);
>         struct ibv_mr *         (*reg_mr)(struct ibv_pd *pd, void *addr,
> size_t length,
>                                           int access);
>         ...
>
> libibverbs can then map calls to the provider, in some cases using static
> inline function.  Whether a static inline call is used or not, the basic
> operation is the same.
>

The static inline could be an issue. Not sure how to do that.

static inline int ibv_poll_cq(struct ibv_cq *cq, int num_entries, struct
> ibv_wc *wc)
> {
>         return cq->context->ops.poll_cq(cq, num_entries, wc);
> }
>

Well dlopen can be used to threat libraries as essentially a vector of
functions if you wanted to to do that. So providers could write an
implementation providing a prescribed set of functions that would then be
converted to a vector array if we need the above.


> From one viewpoint, I'm suggesting replacing 'struct ibv_context_ops' with
> 'struct ibv_qp_ops' + 'struct ibv_cq_ops' + 'struct ibv_mr_ops' + ..., and
> using static inline functions similar to ibv_poll_cq().
>
> I don't see how we can get away from using function pointers into the
> providers.


If you must have these function pointers for certain use cases then the
existing library mechanism can also provide that. Would not function
definitions be better for the rest because we can then avoid the
indirection? Plus we would have a rich set of tools to manage the APIs.


> > Then regarding the separation between user space and kernel space: There
> > are alternate approaches possible to the use of IBverbs. The Linux kernel
> > has added a VFIO interface that allows exposure of device driver controls
> > to user space. If we could base libfabrics on that API instead then the
> > RDMA approach would be generically be usable with any device that
> supports
> > VFIO.
> >
> > http://lwn.net/Articles/509153/
>
> Are you suggesting using VFIO as the interface between user space and the
> kernel, rather than dropping through the kernel ib_user_verbs module?
>
> Do you see the use of VFIO impacting the user space APIs or the internal
> architecture of the framework or both?


VFIO could be used as a base API for getting access to the device. Then we
may need to have some additional syscalls that may be required to provide
functionality that the existing VFIO interface does not provide.

Vendors are already able to write an user space driver just based on the
VFIO spec. If the device supports VFIO then basically also RDMA is possible
by acquiring a VFIO instance of the device and managing the RX and TX
queues in user space. The mellanox driver supports VFIO f.e.

One problem may be that one VFIO instance is needed per process that uses
offload/RDMA and also per device. For our needs however that means that we
have full control of the device at the register level. Which seems
desirable to many of us and would allow the full use of all the features of
the hardware. The API is vendor specific though. What would be needed is a
set of higher level libraries that allow the vendor to provide a userspace
library that then ties into the RDMA framework to make the use of these
devices easy from userspace.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofiwg/attachments/20131204/cc1c4af5/attachment.html>


More information about the ofiwg mailing list