[Ofvwg] Further thoughts on uAPI

Jason Gunthorpe jgunthorpe at obsidianresearch.com
Thu Apr 21 10:24:28 PDT 2016


On Thu, Apr 21, 2016 at 01:35:53PM +0000, Hefty, Sean wrote:
> > The kernel common code side is pretty straightforward, just a bunch of
> > tables of function pointers, templates and idrs for each object_type.
> 
> This is the part where the intent and implementation are not clear
> to me.  I see value in generic user space handle validation, but I
> prefer they map to driver specific resources.  For example, I don't
> want a user space allocation of a QP to result in this:

When we had our telephone call I did raise this as a discussion topic,
framed a little differently than you:

  Do we need any 'common' calls or should 'everything' be driver
  specific?

This is based on the observation that the current uverbs code does
very little beyond just calling a driver function pointer. So why
marshal the userspace information into common uAPI information then
marshall it again into hardware/driver information when the uAPI side
could just format it directly for hardware consumption?

** And to answer my own question: there is actually a good reason to
   continue the 'common' API - internally the uverbs/drivers use the
   kAPI to do most of the work. An incremental API transformation
   requires us to continue to provide that basic flow or face a
   daunting task of radically upgrading every single driver!

> monstrosity being allocated in the kernel.  The size and layout of
> this sort of structure impacts scalability and performance.  This is
> where I want to make sure there's alignment on the overall
> architecture.

Your observation that uAPI users don't need the kapi struct ib_qp is
along the same lines and very valid. Could the driver build a more
efficient struct?

The flip side is that the entire kAPI related to a verbs qp in the
driver becomes duplicated, one side presents the kAPI verbs and the
other side presents the driver-specific ioctl uAPI for the same
functionality.

Obviously this is a radical direction away from the uapi we have
today. Today they all code share and the uverbs translates the uAPI to
the kAPI and back again so drivers only have to implement the kAPI to
get u/k verbs support.

Within the basic sketch I presented there are two basic ways to
address this:
  1) Allow the driver to override RDMA uAPI object call backs
     directly.
     A driver could replace the entire uAPI QP object with its
     own code. The default implementation would be the existing scheme
     that translates the uAPI to the kAPI.
     The driver code could do whatever it wants and isn't required to
     implement the 'struct ib_qp' to get there.
     In this case the driver must still implement the common uAPI for
     verbs QP objects.
  2) Allow the driver to provide a 'driver specific QP' object which
     does not follow the common uAPI. This is a new object that would
     only be used by the driver specific library, and like the above
     totally bypasses all of the standard kAPI design.

That said - there are clearly some important services that core code
is providing, locking, idrs, hot-removal, auto-deletion, things like
comp channels, etc. These are object agnostic and even if you have a
driver specific object they still have to be cared for.

I'm guessing some hybrid is going to be best, eg simple objects like a
verbs PD could be handled through common code calling the kAPI for all
drivers and complex very driver-specific objects like MRs, AHs or QPs
are better routed directly to the driver without using the kapi.

I'm also very concerned to give driver authors a 'free reign' to
design their driver-specific uAPIs, because historically it has been
proven across the kernel that driver authors do a bad job of that kind
of work :( Thus encouraging drivers to stick with the #1 approach as
much as possible should be encouraged, and all examples of #2 has to
be reviewed by the core maintainer group (eg to ensure someone doesn't
try to access EEPROM over this interface :P )

> More specifically, I would switch the existing uABI commands to
> ioctl's, leaving the code paths mostly unchanged, but marking them
> as legacy.  Then add the new ioctls that you suggested, with future
> drivers hooking directly into the ioctl framework.

In an ideal world, I would like to not retain the existing uABI in the
current format. We may decide that is just too much work, but as a
goal..

We already have to maintain it exactly as is (with write) for a long
time, adding another copy that is ioctl based and has to live forever
doesn't seem great :( Maybe the compromise is to use this scheme to
carry the existing struct as the base attribute?

The reason is a little nuanced - I see #1 as the desired path, so we
still have to define this common uAPI for the industry-standard
objects. If we migrate to it then the kernel side can be optimized at
our leisure and there is a very nice incremental transition. Further,
we have only one set of code to maintain in the kernel, and the kernel
uAPI/kAPI default implementation becomes the necessary reference for
driver authors seeking the optimize it.

I think some coding experiments are going to be needed to qualify the
various options.

As a starting point, I am thinking of something along the lines of:

struct ib_device {
   [ .. ] 
   /* During 'ib_device_register' this array is allocated. The driver
      provided ops and common ops merged into one table for
      performance. */
   struct rdma_uapi_class *class_data;
}

struct rdma_uapi_object_data {
   struct .. idr ..
   .. locking ..
   etc
};

struct rdma_uapi_context {
    struct rdma_uapi_object_data *object_data; // Indexed same as class_data
}

typedef int (*rdma_uapi_cb)(struct rdma_uapi_context *,const rdma_object_hdr *imsg,omsg);
struct rdma_uapi_class {
    rdma_uapi_cb create_object;
    rdma_uapi_cb query_object;
    rdma_uapi_cb modify_object;
    rdma_uapi_cb destroy_object;
    rdma_uapi_cb object_specific[4];
    rdma_uapi_cb driver_specific[4];
    uint32_t class_id;
};

core/uapi_verbs.c:

static const struct rdma_uapi_class verbs_uapi_ops[] {
  {.class_id = RDMA_OBJECT_DEVICE, driver_query},
  {.class_id = RDMA_OBJECT_PORT, port_query},
  {.class_id = RDMA_OBJECT_QP, qp_create,qp_query,qo_modify,qp_destroy},
  {.class_id = RDMA_OBJECT_PD, pd_create,qp_query,qp_modify,qp_destroy},
};

/* And perhaps if necessary have common, ib, iwarp, rocee, opa
   versions of these functions for optimal performance */

core/uapi_XXXX.c:

// Future new multi-vendor uAPI
static const struct rdma_uapi_class XXX_uapi_ops[] {
  {.class_id = RDMA_OBJECT_XXX1, },
  {.class_id = RDMA_OBJECT_XXX2, },
};

hw/driver.c:

static const struct rdma_uapi_class driver_uapi_ops[] {
  // Override what uapi_verbs is providing
  {.class_id = RDMA_OBJECT_QP, driver_create,driver_query,driver_modfiy,driver_destroy},
  {.class_id = RDMA_OBJECT_PD}, // NULL, use standard
};

hw/drivers/hfi1.c

// Replaces the char dev:
static const struct rdma_uapi_class hfi_uapi_ops[] {
  // Driver directly provides its own object
  {.class_id = RDMA_OBJECT_HFI1_CTXT,
   .create_object = assign_ctxt,
   .query_object = get_ctxt_info,
   .destroy_object = hfi1_cmd_ctxt_reset,
   .object_specific = {
       [HFI1_CMD_RECV_CTRL] = ...,
       ..
   },
  },
}

core/uapi_common.c:

static const struct rdma_uapi_class common_uapi_ops[] {
  /* Format the content of class_data so userspace can figure out what
     the kernel supports. */
  {.class_id = RDMA_OBJECT_INTERFACE, query_classes},
}

int fops_ioctl(... )
{
	const struct rdma_object_hdr *hdr = copy_from_user(udata)
	struct rdma_uapi_context *ctx = fops->private_data;

	if (hdr->class_id >= ctx->device->class_data_len ||
	    hdr->op_id >= MAX_OP_ID)
	    return EOPNOTSUPP;

        const struct rdma_uapi_class *class = ctx->device->class_data[hdr->class_id];
        // Direct-index the labeled function pointer scheme as an array for performance
        rdma_uapi_cb target = ((rdma_uapi_cb)class)[hdr->op_id];
	if (target == NULL)
	    return EOPNOTSUPP;

        /* There should probably be a little bit more support here,
	   eg common code to access the object-specific IDR, hold
	   things like the rw sem for hot-removal, basic parse and validate
	   the message. But this is the basic idea */
        return target(ctx,....);
}

Jason



More information about the ofvwg mailing list