[ofiwg] thoughts on initializing libfabric structs

Mon Jul 28 11:45:12 PDT 2014

> The entire point of having a special bit mask is specifically to
> communicate that the writer has actually properly initialized the
> fields. This requires the initializing be explicit in the non-library
> code that is writing the structure.
>
> If you initialize the mask in common code then recompile you now have
> to deal with the case where the structure claims to include something
> but the write side has no knowledge of this and didn't initalize it,
> which makes no sense.
>
> Having the library header initialize the structure is a legitimate
> approach to ABI compatability, but don't use a bitmask, just use an
> ABI version field, and carefully define a null value state for all
> fields. That will avoid confusion.

I'm missing something here, because the static initializer should 
"carefully define a null value state for all fields".
The whole idea of these initializers is exactly to ensure that all
allocated fields are in fact initialized, whether or not
the code that allocates the structure knows about or uses the fields.

A knowledgeable user can always perform the initialization "by hand",
either without using the initializers provided by the library or
by overriding them, as long as the intent of the initializers is
well explained.

>
>> static inline void
>> fi_mr_attr_destroy(struct fi_mr_attr *attr)
>> {
>> 	/* This can be a no-op for simple structures */
>
> Wow, that's a pain. There are many places where simple stack allocated
> structures are sufficient and adding a API requirement to destroy them on
> unwind is a huge burden, especially if the destroy is just a
> NOP.

For simple stack allocated structures, the FI_FOO_INITIALIZER
would be used, in which case neither
fi_foo_init() nor fi_foo_destroy() are called.

The fi_foo_destroy() would be used only if a fi_foo_init()
were used.
If in fact a particular destroy is a nop, then the definition
in the header file could just be
#define fi_mr_attr_destroy(attr)  do {} while(0)

However, won't most compilers just optimize out a call to an empty
static inline anyway?? gcc does.

>
> Be careful with creating abstraction burdens, verbs succeeded in
> gaining adoption when other heavy weight design-by-committee schemes
> like DAPL failed.
>
>> 1. a release/version number
>>    Pro: A release or version number is simple, easy to document,
>>    and easy for both users and implementers to understand and
>>    reference.
>>    Con: A release or version number must be maintained "by hand"
>>    and requires other documentation to indicate what should or
>>    should not be present in the structure.  In particular, the
>>    allocation size of the structure must be linked to the release
>>    number "by hand".  There is no easy way to verify that the
>>    code implementing the structure conforms to the release or
>>    version number.
>
> I actually don't see this as a con. Being explicit and careful with
> the ABI is critically important.
>
> The implementation should include a static_assert scheme to validate
> the size of the structures during the library compile.

Could you give an example of this?
How does a provider function that is given a pointer to a structure
as a parameter verify the size of that structure, since it may NOT
be the sizeof() that structure when the provider function was compiled?
Our assumption is that the provider may be compiled at a different
time (and therefore, with a different version) than the application
calling the provider, yet the provider wants to be able to work
with all application versions less than or equal to the version
for which the provider was compiled.  Isn't the only way
to know the allocation size of those earlier versions would be
to #define FI_FOO_VERSION_2_SIZE 128  (for example), but
the 128 would have to be supplied "by hand"?

>
> Bearing in mind there are other practical problems here, the library
> needs to provide some idea to the compiling code what the revision is
> so it can properly #ifdef if it wants to compile against multiple
> versions. That is much simpler with a single revision number vs other
> schemes.
>

Wouldn't the name of the bit in the bit-mask (number 3) be a better
target of the #ifdef, since that would not be defined at compile-time
if that field were not present?  This would require the bit names be
define constants, (as well as enum values if that is also desirable).

>> 2. the allocation size of the structure in bytes
>>    Pro: Easy to guarantee correct initialization at
>>    compile-time using sizeof().
>>    Con: Requires other documentation to indicate what should or
>>    should not be present in the structure.  There is no easy way
>>    to verify that the code implementing the structure conforms to
>>    the size.  No obvious correlation with a release or version
>>    number.
>
> This only works well if you are very careful of padding. Eg a adding a
> single 32 bit entry to a structure with a 64 bit member will still use
> 8 bytes, and the presence/absence of the last unused 4 bytes is not
> detectable. Also be aware compilers are not required to 0 implicit
> padding in bracketed initializers.
>
> So size is broadly not a great choice.

Agreed.

>
>>    version number.  The allocation size of the structure must be
>>    linked to the attributes "by implication" (i.e., given a pointer
>>    to a structure at run-time, the presence of an attribute that
>
> There are very few cases where the allocation size would actually
> matter (eg arrays), and those cases will need an explicit size passed
> in anyway to work properly.
>

Agreed.