[ofiwg] thoughts on initializing libfabric structs
Jason Gunthorpe
jgunthorpe at obsidianresearch.com
Mon Jul 28 14:49:48 PDT 2014
On Mon, Jul 28, 2014 at 09:34:12PM +0000, Hefty, Sean wrote:
> Thanks, Bob, for looking into this.
>
> > In practice, you'd want to exactly preserves each structure as it was
> > at each release:
> >
> > struct fabric_foo_rel1 {...};
> > struct fabric_foo_rel2 {...};
> > struct fabric_foo_rel3 {...};
> > typedef fabric_foo_rel3 fabric_foo;
>
> I was starting to reach this same conclusion.
>
> With libibverbs, we were restricted by backwards compatibility. A
> new library doesn't have that restriction, so we should be able to
> define a simple, reasonable model up front.
>
> I was wondering if we even needed version information stored per
> structure. libfabric has what is essentially an initial entry point
> (fi_getinfo). We could add a version parameter (corresponds to
> minor version number) to that call, which could be used by provider
> to select the correct data structures known to the app. Providers
> that don't support the specified version would simply fail the call.
That sounds nice and simple, but you need to consider the effects of
shared libraries:
main program :
res = fi_getinfo(API_VER = 2)
some_solib(res)
some_solib():
struct fabric_foo_rel1 = {}
fi_something(res,&rel1);
Oops, now it blows up, because main program and same_solib are not
using the same libfabric ABI revision.
Designing a good ABI is a tricky thing. The best guidelines:
- Avoid structures
- Keep structures opaque. The consumer only sees an anonymous
pointer, allocation is done in the library and never exposed
outside the library.
- When the call really requires an on-stack structure or structure pointer
as an argument, rely on symbol versioning at the call site to
resolve the caller's ABI. If the call site is a function pointer,
have the indirection inline wrapper apply an ABI tag
- Always have the library allocate structure memory
Look closely at everything, and decided if it *really* needs a
structure, if the structure can be on-stack and symbol versioning will
do the job, if the path is low speed and an approach like pthread_attr
is more appropriate.
A big chunk of the problem with verbs is that everything was exposed
as structures visible-to and sometimes allocated-by the wrong things.
Jason
More information about the ofiwg
mailing list