[Openframeworkwg] source code for proof of concept framework available

Hefty, Sean sean.hefty at intel.com
Mon Dec 9 18:11:21 PST 2013


> > I should note that I extended the rkey and immediate data as seen by
> > the API to 64 bits.
> 
> Good idea from an ABI perspective, but exactly how IB rkey maps into
> that 64 bit space must be documented. Existing protocols exchange 32
> bits in binary format on the wire, we can't force a write protocol
> change to convert to libfabric :|

I'm assuming that with IB the upper 32-bits would just be 0.  I believe iwarp will introduce 64-bit immediate data, but I haven't checked the latest RFC to be certain.

> I would ditch the FI_EIVAL/etc constants then. The only FI error
> constants should be ones that have already been documented to be
> allowed returns. fi_strerror can still process them of course..
> 
> Having existing constants like that is just a temptation to use them
> in provider or user code rather than actually properly create and
> document error cases.

I can comment them out until they are used.

> > > function(obj):
> > >   if (includes_function(obj->size,function))  // 1 deref and if
> > >     struct ops *ops = obhs->ops;        // 1 deref
> > >     if (ops->function) != null          // 1 deref and if
> > >         ops->function(objs,..);
> >
> > We have asserts in place of if statements.  I don't see a need to
> > add checks to every call to see if it exists.  (What's an app really
> > going to do with ENOSYS outside of initialization code?)  The app
> > just does:
> 
> I saw the asserts, I assumed they were some kind of placeholders..

debugging aid
 
> What is your plan for growing ops over time? Require that apps never
> call an op that isn't implemented?

Hadn't thought that far ahead.  :)  But, basically require apps not to call unimplemented ops.

> 
> > obj->ops->function(obj...);
> >
> > ops doesn't _have_ to point to a static struct, but it can.  And
> > it's trivial to implement a function that returns ENOSYS, including
> > an ops structure where everything returns ENOSYS, if we want to
> > state that a provider must implement all functions.
> 
> If obj->ops->future_function is not allowed to read unallocated memory
> when linking to an old libfabric then it doesn't really matter if it
> returns ENOSYS as it can never be called.

I was also concerned with providers that did not or could not implement certain functionality, and not just forward/backward compatibility. 

> > we're looking at 50-60+ function calls hanging off every endpoint
> > object.  That's an additional 500 bytes or so of data per QP, which
> > is significant at scale.  The alternative is to squish everything
> 
> This is a very good point, I wouldn't want to see all this replicated
> for every QP. I guess that is an advantage to how verbs bundled
> everything into one giant ops structure.
> 
> > fi_create_endpoint(... &ep);
> > ep->open_msg_interface(ep, &msg_ep)
> > ep->open_rdma_interface(ep, &rdma_ep)
> >
> > such that the user would call:
> >
> > msg_ep->send(), msg_ep->recv()
> > rdma_ep->read(), rdma_ep->write()
> >
> > That requires a little more work on the user's part, but would avoid
> > the double indirection, not result in a bunch of NULL function
> > pointers, and could add more type checking.  The drawback is that we
> > still have more overhead.
> 
> Interesting concept, what overhead do you see? The msg_ep/rdma_ep
> could be globally uniquified - one dynamic structure per provider per
> object type.

The way I was thinking of it, there would have been 1 msg_ep per ep.  But I think I see what you were thinking.  The difference seems to be whether you have:

msg_ep->send(msg_ep, ...)   or   msg_ep->send(ep, ...)

Assuming I'm following your thought correctly, this would provide an even smaller footprint per endpoint than what I have currently.
 
> An API like that presents a different way to handle API expansion:
> 
>  fi_create_endpoint(... &ep);
>  fi_open_msg_interface(ep, &msg_ep)
> 
> Now we can symbol version fi_open_msg_interface such that msg_ep is
> guaranteed to be fully populated and usable by the application, or the
> call returns an error.
> 
> If we change the struct under msg_ep then re-version
> fi_open_msg_interface and support both...

I'll give this more thought, but I like the idea.  If you added some flags or attributes to the open_interface calls, you could even open the same interface multiple times, but with different optimizations defined, for a single endpoint.

- Sean



More information about the ofiwg mailing list