[ofiwg] thoughts on initializing libfabric structs

Jason Gunthorpe jgunthorpe at obsidianresearch.com
Mon Jul 28 10:32:08 PDT 2014


On Mon, Jul 28, 2014 at 01:01:21PM -0400, Robert D. Russell wrote:

> In order to statically initialized a structure to some sane default
> values, a static initializer macro (i.e., FI_FOO_INITIALIZER for
> struct fi_foo) MUST be provided that should place the structure into a
> valid state (including version/mask fields).  This macro must set the
> correct structure state (size/version) based on the compile-time version
> of libfabric, and must default initialize all fields (Rationale: this
> allows recompiling the application against a newer version of libfabric
> containing more fields than were available when the application was
> written).  The application or provider will be expected to use this
> macro to initialize each API structure that it allocates on the stack or
> as a global/static variable.

This is actually not how this scheme is supposed to work in
libibverbs.

The entire point of having a special bit mask is specifically to
communicate that the writer has actually properly initialized the
fields. This requires the initializing be explicit in the non-library
code that is writing the structure.

If you initialize the mask in common code then recompile you now have
to deal with the case where the structure claims to include something
but the write side has no knowledge of this and didn't initalize it,
which makes no sense.

Having the library header initialize the structure is a legitimate
approach to ABI compatability, but don't use a bitmask, just use an
ABI version field, and carefully define a null value state for all
fields. That will avoid confusion.

> static inline void
> fi_mr_attr_destroy(struct fi_mr_attr *attr)
> {
> 	/* This can be a no-op for simple structures */

Wow, that's a pain. There are many places where simple stack allocated
structures are sufficient and adding a API requirement to destroy them on
unwind is a huge burden, especially if the destroy is just a
NOP. 

Be careful with creating abstraction burdens, verbs succeeded in
gaining adoption when other heavy weight design-by-committee schemes
like DAPL failed.

> 1. a release/version number
>    Pro: A release or version number is simple, easy to document,
>    and easy for both users and implementers to understand and
>    reference.
>    Con: A release or version number must be maintained "by hand"
>    and requires other documentation to indicate what should or
>    should not be present in the structure.  In particular, the
>    allocation size of the structure must be linked to the release
>    number "by hand".  There is no easy way to verify that the
>    code implementing the structure conforms to the release or
>    version number.

I actually don't see this as a con. Being explicit and careful with
the ABI is critically important.

The implementation should include a static_assert scheme to validate
the size of the structures during the library compile.

Bearing in mind there are other practical problems here, the library
needs to provide some idea to the compiling code what the revision is
so it can properly #ifdef if it wants to compile against multiple
versions. That is much simpler with a single revision number vs other
schemes.

> 2. the allocation size of the structure in bytes
>    Pro: Easy to guarantee correct initialization at
>    compile-time using sizeof().
>    Con: Requires other documentation to indicate what should or
>    should not be present in the structure.  There is no easy way
>    to verify that the code implementing the structure conforms to
>    the size.  No obvious correlation with a release or version
>    number.

This only works well if you are very careful of padding. Eg a adding a
single 32 bit entry to a structure with a 64 bit member will still use
8 bytes, and the presence/absence of the last unused 4 bytes is not
detectable. Also be aware compilers are not required to 0 implicit
padding in bracketed initializers.

So size is broadly not a great choice.

>    version number.  The allocation size of the structure must be
>    linked to the attributes "by implication" (i.e., given a pointer
>    to a structure at run-time, the presence of an attribute that

There are very few cases where the allocation size would actually
matter (eg arrays), and those cases will need an explicit size passed
in anyway to work properly.

Jason



More information about the ofiwg mailing list