[Ofvwg] Further thoughts on uAPI

Jason Gunthorpe jgunthorpe at obsidianresearch.com
Mon Apr 25 11:19:53 PDT 2016


On Sun, Apr 24, 2016 at 08:11:47PM +0000, Hefty, Sean wrote:

> After fully over-analyzing things, these are my current thoughts.
> 
> I'm for merging all the rdma uABI interfaces.  This will allow us to
> share events, and allow closer association between objects.

Yes

> We have 256 ioctl commands available.  A straightforward mapping
> would result in uverbs using 43, ucma 23, and ucm 18.  I'd deprecate
> ucm, but if it needs to be kept, it could probably drop to about 6
> commands.  Uverbs could be re-structured into 10 objects -- each
> with create/query/modify/close routines -- plus about 6 fast path
> commands.

Yes, this seems broadly right to me.

However, I had intended to use the object type carred in the ioctl arg
as the primary mux and the ioctl would just indicate the 'method'. The
method ID table would be split much like you describe:

'core common' object routines
'built-in extra' object routines
'driver-fast-path' object routines

Not sure about experimental..

~128 unique methods for every object seems like enough??

Why do you feel cm/mgmt needs dedicated routines? I was going to model
CM as more objects and use the 'built-in extra' block to make CM
object specific calls (eg bind/etc)

This still works OK for strace: it has to parse the ioctl # and then
look into the class_id uniform first dword, then it knows exactly how
to format and parse the ioctl argument.

> A command block has a num_ops, followed by an array of calls.  Each
> device structure has an array of pointers to command blocks.  This
> allows a driver to override any call, without necessarily storing a
> huge function table.

My sketch had the drivers just provide the individual things they
wanted to provide/override by number:

 static const struct rdma_uapi_class hfi_uapi_ops[] {
  // Driver directly provides its own object
  {.class_id = RDMA_OBJECT_HFI1_CTXT,
   .create_object = assign_ctxt,

And then rely on a 'compile' phase during registration to build a
micro-optimized dispatch table.

> For the base ioctl command, I would also add these two fields:
> op_ctrl and flags.  I'm envision that these fields can be used to
> determine the format of the input/output data.

There has been a lot of talk of using a structure like netlink with a
linked list of binary attributes and an optional/mandatory flag. For
the lower speed stuff that seems reasonable, though it is certainly
over-engineered for some commands.

So, a sketch would look like this:

struct msg
{
   uint16_t length;
   uint16_t class_id;
   uint32_t object_id; // in/out
   struct qp_base_attr
   {
       uint16_t length;
       uint16_t attribute_id;

       uint16_t qpn;  //in/out
       uint16_t qp_flags;
       uint16_t max_send_wr,max_recv_qr,max_send_sge,////
   };
   // Option to piggy back what ibv_modify_qp does:
   struct qp_addr_ib
   {
       uint16_t length;
       uint16_t attribute_id;

       uint16_t dlid,slid,sl,pkey,etc;
   }
}

msg.length = sizeof(msg);
msg.class_id = RDMA_OBJ_QP_UD;
msg.base.legnth = sizeof(msg.base);
msg.base.attribute_id = RDMA_ATTR_QP_BASE;
msg.base.qp_flags = XX
[..]
ioctl(fd,RDMA_CREATE_OBJECT,&msg);
[..]
ioctl(fd,RDMA_MODIFY_OBJECT,&msg2);

Jason



More information about the ofvwg mailing list