[Ofvwg] Further thoughts on uAPI

Doug Ledford dledford at redhat.com
Tue Apr 26 07:19:37 PDT 2016


On 4/24/2016 10:15 AM, Liran Liss wrote:

> For generic interfaces (currently includes Verbs, Ethernet QPs, and IB management), the new scheme should map what we have today in a flexible manner.
> This would enable us, for example, to pass only RoCE addressing attributes while modifying a RoCE QP (and optionally optimizing the kernel representation as well).
> These interfaces have a matching kAPI.

This sounds like something I was thinking as well.  Of course, abstract
ideas are sometimes less similar than you think, so putting something
concrete down can help make sure that people are actually thinking about
the same thing.

For certain operations that have lots of optional items (work requests
for one, work completions for another), the old method has been to stick
everything in one struct (which bloats it for most uses), or the extreme
opposite end of the spectrum was the recent timestamping API patches
that totally deconstructed the wc struct and rebuilt it from individual
elements and completely reordered.  Another approach to dealing this
this is tons of different structs (Christoph's work request struct rework).

Maybe we can reach a different arrangement.  I'm thinking of one base
struct that's versioned.  This base struct is the common items we always
need across the board:

struct __work_completion_common_v1 {
	union { // Only first for alignment reasons
		u64		wr_id;
		struct ib_cqe	*wr_cqe;
	}
	int	magic = 1; /* magic starts as the version #
			      in the lower 8 bits, then we
			      add flags for optional struct
			      elements */
	enum ib_wc_status	status;
	enum ib_wc_opcode	opcode;
	u32			len;
};

#define rdma_wc __work_completion_common_v1

Then we create optional, additional structs.  Such as a specific struct
for each address type:

/* The common struct and option struct versions always match */
struct __work_completion_ib_addr_v1 {
	struct ib_qp		*qp;
	u32			src_qp;
	u16			pkey_index;
	u16			slid;
	u8			port_num;
	u8			reserved[3]; // preserve u64 alignment
};

#define rdma_wc_ib_addr __work_completion_ib_addr_v1

struct __work_completion_eth_addr_v1 {
	u8			smac[ETH_ALEN];
	u16			vlan_id;
};

#define rdma_wc_eth_addr __work_completion_eth_addr_v1

....

Addtional optional struct items can then be defined, for things like
errors, immediate/invalidate data, timestamps, etc.

When building a wc, you start with the base struct, the magic is set to
the version, then for each optional element you add, you set a flag
field for that element in the magic item.  Optional element flags occupy
the upper 24 bits.  The length of the total struct is the length of the
base struct plus the length of all optional structs, and the order of
the optional structs matches their bit order from lowest to highest in
the magic element.  It's not quite as free form as the patches for
timestamp support were, but still allows the structs some flexibility in
what is included and what isn't.

When parsing the wc, you verify you have the right version first, then
you process what you need from the common struct, and if you have need
of it, process any additional stuff by walking the set bits in the magic
struct to get to each optional struct item.

Something like that can be applied to wcs, wrs, and where there is
enough variability to warrant it, other items as well.  Of course, if an
item doesn't vary all that much, then a single struct is still preferable.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 884 bytes
Desc: OpenPGP digital signature
URL: <http://lists.openfabrics.org/pipermail/ofvwg/attachments/20160426/acdc2226/attachment.sig>


More information about the ofvwg mailing list