[Openframeworkwg] source code for proof of concept framework available

Jason Gunthorpe jgunthorpe at obsidianresearch.com
Mon Dec 9 17:00:04 PST 2013


On Mon, Dec 09, 2013 at 11:49:05PM +0000, Hefty, Sean wrote:
> > It is a tricky bit, but, eg rkeys are integers they should always
> > be in host order in the API calls, immediate data is accessed as a
> > uint32, not a void * so it should be in host order, etc.
> 
> Declaring an rkey as be32_t, means that it's big Endian everywhere.
> The app simply passes it to the remote side, where it's used.  Host
> order forces the app be big/little endian aware and handle the byte
> swapping appropriately.

Please don't do that :(

By host order I mean the number printed by printf("%x",rkey) should be
the same on a BE and LE machine and all the APIs exported by libfabric
should accept that value as a correct rkey.

Anything else is madness that superficially makes some cases simpler,
but other cases crazy. 

Consider, if an app exchanges the rkey over TCP using textual
http/xml/json. Textual protocols should NEVER have to worry about byte
swapping, that would be hugely surprising.

Consider, every app using binary communication already defines an
endianness and already expects to endian swap every single integral
value it transfers. Not swapping an integral value violates the
principle of least surprise.

> Having immediate data be host order can force byte swapping in the
> provider on both the sending and receiving side.  This may be a wash
> in terms of performance, unless the app is using immediate data to
> exchange one of several constant values.

I agree it is a choice, if no swap then don't use an integral value to
hold immediate data. That provides a consistent no-surprises API.

> > Byte swaps for that kind of stuff is just more places where people
> > can make an error and never detect it until they test on a BE/LE
> > mix.
> 
> The rkey and immediate data are intended to go across the network.
> Declaring them in host order would seem to increase the likelihood
> of errors, since it forces byte swapping.  This seems especially
> true for the rkey.

Again, if that is the case then declare rkey as a non-integral value.

> > If it needs to present something as network order then
> > it is binary data and it is a void *.
> 
> This would require an extra memory dereference.  For fixed-size
> items, I'd rather see beXX_t.

typedef union {
    uint8_t value[4];
    uint32_t __value32;
} be32_t;

C can pass by value a struct like that without overhead. This forces
apps to use memcpy() rather than =, precludes using it in printf, etc.

> I should note that I extended the rkey and immediate data as seen by
> the API to 64 bits.

Good idea from an ABI perspective, but exactly how IB rkey maps into
that 64 bit space must be documented. Existing protocols exchange 32
bits in binary format on the wire, we can't force a write protocol
change to convert to libfabric :|

> > A reasonable path might be to have them be opaque and then define and
> > specify as a need is discovered.
> 
> It is currently expected that all error codes _may_ to be passed to
> fi_strerror(), and that we will eventually define what errors can be
> returned from every call.

I would ditch the FI_EIVAL/etc constants then. The only FI error
constants should be ones that have already been documented to be
allowed returns. fi_strerror can still process them of course..

Having existing constants like that is just a temptation to use them
in provider or user code rather than actually properly create and
document error cases.

> > function(obj):
> >   if (includes_function(obj->size,function))  // 1 deref and if
> >     struct ops *ops = obhs->ops;        // 1 deref
> >     if (ops->function) != null          // 1 deref and if
> >         ops->function(objs,..);
> 
> We have asserts in place of if statements.  I don't see a need to
> add checks to every call to see if it exists.  (What's an app really
> going to do with ENOSYS outside of initialization code?)  The app
> just does:

I saw the asserts, I assumed they were some kind of placeholders..

What is your plan for growing ops over time? Require that apps never
call an op that isn't implemented?

> obj->ops->function(obj...);
> 
> ops doesn't _have_ to point to a static struct, but it can.  And
> it's trivial to implement a function that returns ENOSYS, including
> an ops structure where everything returns ENOSYS, if we want to
> state that a provider must implement all functions.

If obj->ops->future_function is not allowed to read unallocated memory
when linking to an old libfabric then it doesn't really matter if it
returns ENOSYS as it can never be called.

> we're looking at 50-60+ function calls hanging off every endpoint
> object.  That's an additional 500 bytes or so of data per QP, which
> is significant at scale.  The alternative is to squish everything

This is a very good point, I wouldn't want to see all this replicated
for every QP. I guess that is an advantage to how verbs bundled
everything into one giant ops structure.

> fi_create_endpoint(... &ep);
> ep->open_msg_interface(ep, &msg_ep)
> ep->open_rdma_interface(ep, &rdma_ep)
> 
> such that the user would call:
> 
> msg_ep->send(), msg_ep->recv()
> rdma_ep->read(), rdma_ep->write()
> 
> That requires a little more work on the user's part, but would avoid
> the double indirection, not result in a bunch of NULL function
> pointers, and could add more type checking.  The drawback is that we
> still have more overhead.

Interesting concept, what overhead do you see? The msg_ep/rdma_ep
could be globally uniquified - one dynamic structure per provider per
object type.

An API like that presents a different way to handle API expansion:

 fi_create_endpoint(... &ep);
 fi_open_msg_interface(ep, &msg_ep)

Now we can symbol version fi_open_msg_interface such that msg_ep is
guaranteed to be fully populated and usable by the application, or the
call returns an error.

If we change the struct under msg_ep then re-version
fi_open_msg_interface and support both...

Cheers,
Jason



More information about the ofiwg mailing list