[Ofvwg] Further thoughts on uAPI

Hefty, Sean sean.hefty at intel.com
Mon Apr 25 12:16:12 PDT 2016


> However, I had intended to use the object type carred in the ioctl arg
> as the primary mux and the ioctl would just indicate the 'method'. The
> method ID table would be split much like you describe:
> 
> 'core common' object routines
> 'built-in extra' object routines
> 'driver-fast-path' object routines

I did understand the proposal.  My main concern was that it appeared that it would result in a very large function array, potentially with a significant number of NULL functions, associated with each driver.

> Not sure about experimental..

I wasn't sure either.

> ~128 unique methods for every object seems like enough??

Seems more than enough to me

> Why do you feel cm/mgmt needs dedicated routines? I was going to model
> CM as more objects and use the 'built-in extra' block to make CM
> object specific calls (eg bind/etc)

I separated the cm/mgmt calls because I doubt a driver will ever override them, and some of the calls are system wide, versus being bound to a driver. 

> This still works OK for strace: it has to parse the ioctl # and then
> look into the class_id uniform first dword, then it knows exactly how
> to format and parse the ioctl argument.
> 
> > A command block has a num_ops, followed by an array of calls.  Each
> > device structure has an array of pointers to command blocks.  This
> > allows a driver to override any call, without necessarily storing a
> > huge function table.
> 
> My sketch had the drivers just provide the individual things they
> wanted to provide/override by number:
> 
>  static const struct rdma_uapi_class hfi_uapi_ops[] {
>   // Driver directly provides its own object
>   {.class_id = RDMA_OBJECT_HFI1_CTXT,
>    .create_object = assign_ctxt,
> 
> And then rely on a 'compile' phase during registration to build a
> micro-optimized dispatch table.
> 
> > For the base ioctl command, I would also add these two fields:
> > op_ctrl and flags.  I'm envision that these fields can be used to
> > determine the format of the input/output data.
> 
> There has been a lot of talk of using a structure like netlink with a
> linked list of binary attributes and an optional/mandatory flag. For
> the lower speed stuff that seems reasonable, though it is certainly
> over-engineered for some commands.
> 
> So, a sketch would look like this:
> 
> struct msg
> {
>    uint16_t length;
>    uint16_t class_id;
>    uint32_t object_id; // in/out
>    struct qp_base_attr
>    {
>        uint16_t length;
>        uint16_t attribute_id;
> 
>        uint16_t qpn;  //in/out
>        uint16_t qp_flags;
>        uint16_t max_send_wr,max_recv_qr,max_send_sge,////
>    };
>    // Option to piggy back what ibv_modify_qp does:
>    struct qp_addr_ib
>    {
>        uint16_t length;
>        uint16_t attribute_id;
> 
>        uint16_t dlid,slid,sl,pkey,etc;
>    }
> }
> 
> msg.length = sizeof(msg);
> msg.class_id = RDMA_OBJ_QP_UD;
> msg.base.legnth = sizeof(msg.base);
> msg.base.attribute_id = RDMA_ATTR_QP_BASE;
> msg.base.qp_flags = XX
> [..]
> ioctl(fd,RDMA_CREATE_OBJECT,&msg);
> [..]
> ioctl(fd,RDMA_MODIFY_OBJECT,&msg2);

I had followed this, but wondered if it wouldn't be easier to just say, use structure 1 or structure 2.  A lot of the need for this complexity seems driven by treating all QPs as a single object, rather than separate objects.  Making that change might simplify things..?  Also I think we should consider reasonable optimizations for connecting QPs.  Doug and I had to debug apps that broke because the connection process was not completing quick enough.

- Sean



More information about the ofvwg mailing list