[Openframeworkwg] source code for proof of concept framework available
Hefty, Sean
sean.hefty at intel.com
Mon Dec 9 18:11:21 PST 2013
> > I should note that I extended the rkey and immediate data as seen by
> > the API to 64 bits.
>
> Good idea from an ABI perspective, but exactly how IB rkey maps into
> that 64 bit space must be documented. Existing protocols exchange 32
> bits in binary format on the wire, we can't force a write protocol
> change to convert to libfabric :|
I'm assuming that with IB the upper 32-bits would just be 0. I believe iwarp will introduce 64-bit immediate data, but I haven't checked the latest RFC to be certain.
> I would ditch the FI_EIVAL/etc constants then. The only FI error
> constants should be ones that have already been documented to be
> allowed returns. fi_strerror can still process them of course..
>
> Having existing constants like that is just a temptation to use them
> in provider or user code rather than actually properly create and
> document error cases.
I can comment them out until they are used.
> > > function(obj):
> > > if (includes_function(obj->size,function)) // 1 deref and if
> > > struct ops *ops = obhs->ops; // 1 deref
> > > if (ops->function) != null // 1 deref and if
> > > ops->function(objs,..);
> >
> > We have asserts in place of if statements. I don't see a need to
> > add checks to every call to see if it exists. (What's an app really
> > going to do with ENOSYS outside of initialization code?) The app
> > just does:
>
> I saw the asserts, I assumed they were some kind of placeholders..
debugging aid
> What is your plan for growing ops over time? Require that apps never
> call an op that isn't implemented?
Hadn't thought that far ahead. :) But, basically require apps not to call unimplemented ops.
>
> > obj->ops->function(obj...);
> >
> > ops doesn't _have_ to point to a static struct, but it can. And
> > it's trivial to implement a function that returns ENOSYS, including
> > an ops structure where everything returns ENOSYS, if we want to
> > state that a provider must implement all functions.
>
> If obj->ops->future_function is not allowed to read unallocated memory
> when linking to an old libfabric then it doesn't really matter if it
> returns ENOSYS as it can never be called.
I was also concerned with providers that did not or could not implement certain functionality, and not just forward/backward compatibility.
> > we're looking at 50-60+ function calls hanging off every endpoint
> > object. That's an additional 500 bytes or so of data per QP, which
> > is significant at scale. The alternative is to squish everything
>
> This is a very good point, I wouldn't want to see all this replicated
> for every QP. I guess that is an advantage to how verbs bundled
> everything into one giant ops structure.
>
> > fi_create_endpoint(... &ep);
> > ep->open_msg_interface(ep, &msg_ep)
> > ep->open_rdma_interface(ep, &rdma_ep)
> >
> > such that the user would call:
> >
> > msg_ep->send(), msg_ep->recv()
> > rdma_ep->read(), rdma_ep->write()
> >
> > That requires a little more work on the user's part, but would avoid
> > the double indirection, not result in a bunch of NULL function
> > pointers, and could add more type checking. The drawback is that we
> > still have more overhead.
>
> Interesting concept, what overhead do you see? The msg_ep/rdma_ep
> could be globally uniquified - one dynamic structure per provider per
> object type.
The way I was thinking of it, there would have been 1 msg_ep per ep. But I think I see what you were thinking. The difference seems to be whether you have:
msg_ep->send(msg_ep, ...) or msg_ep->send(ep, ...)
Assuming I'm following your thought correctly, this would provide an even smaller footprint per endpoint than what I have currently.
> An API like that presents a different way to handle API expansion:
>
> fi_create_endpoint(... &ep);
> fi_open_msg_interface(ep, &msg_ep)
>
> Now we can symbol version fi_open_msg_interface such that msg_ep is
> guaranteed to be fully populated and usable by the application, or the
> call returns an error.
>
> If we change the struct under msg_ep then re-version
> fi_open_msg_interface and support both...
I'll give this more thought, but I like the idea. If you added some flags or attributes to the open_interface calls, you could even open the same interface multiple times, but with different optimizations defined, for a single endpoint.
- Sean
More information about the ofiwg
mailing list