[ofiwg] wait sets vs poll sets vs application

Dardo D Kleiner - CONTRACTOR dkleiner at cmf.nrl.navy.mil
Fri Mar 27 03:49:04 PDT 2015

On 03/26/2015 03:08 PM, Hefty, Sean wrote:
>> However, the subtle differences in the interfaces seems to make it very
>> difficult to achieve kind of a consistent usage approach.  In particular:
>> 1) wait sets don't return context
> There's been some discussion on this.  The main issue is that returning a context imposes additional implementation requirements that may not be easily met.  The wait set becomes more than just a wait object that gets signaled.  It now needs to provide a queue of what object(s) signaled the wait object, as there could have been more than one which caused the wake-up.

What makes epoll(7) so much better than poll(2) is context.  Normally I wouldn't use poll(2) unless I only have one (or maybe just a couple) thing(s) to monitor (ignoring cross-platform implications).  fi_wait seems like poll(2) - though as you mention later using fi_wait+fi_poll might be a reasonable alternative (double polling makes me itch a bit however).

>> 2) poll sets don't support timeouts
> Poll sets are non-blocking.  A poll set allows an application to easily drive progress across multiple event/completion queues.  The poll set also checks for events across multiple queues.

The naming choices here are a bit confusing, poll(2) has timeout but no context.  fi_poll seems like epoll(7) (which is good), but the interface doesn't allow for timeout.  I'm (usually) writing middleware where if the user wants the CPU-savings of the "wait for some event" concept (at the expense of latency), then I like to give them that choice.

>> 3) poll sets don't support access to their underlying "wait" object
> It sounds like you may want to try using the wait sets and poll sets together.  For example, group all objects into a wait set and a poll set.  When wait is signaled, call the poll set to get the objects that have events.  This would give you the contexts that you're wanting.

Hadn't thought of this - though in practice I pretty much have to do similar in my middleware since I usually have other things to monitor - i.e. take the wait set's underlying object and put it in a higher-level collection.  I'll probably just end up sticking to the way I've been doing it with the individual objects of interest.

As I mentioned, I was piqued by the idea of 1) possible provider optimizations (which might in reality not buy me a whole lot) and 2) the "event" abstraction possibility (which may also be too much to hope for - libevent is super-complicated and can't even yet abstract the one extra thing I'm really looking for: Windows IOCP)

>> So my question is, is the design of these object interfaces intentional in
>> this regard?  Should apps just pick one approach and work with what's
>> there?
> The design is intentional, but that doesn't mean that the use cases are fully worked out. :)  They are designed to be used together.  Your feedback on this is definitely useful.

This design clarification is useful - I'll explore and feed back.

>> I also went ahead and put epoll-based wait and poll implementations into
>> my fork of the verbs provider and it seems of value (disregarding issues
>> above) - though it doesn't really make sense there as its not verbs
>> specific.  Perhaps any provider that uses fd's for internal waits can
>> leverage a common epoll implementation?
> I had the same thought as well.  And I agree that the internal implementation of wait sets could be improved by using epoll. (Well, technically, the internal implementation could be improved by being written, but assuming that it was written, then epoll seems like a nice option. :)


Again, I don't think this is verbs-specific, but it seems non-trivial to put provider-reusable functionality up in libfabric core (other than "fi_enosys" ;))

- Dardo

More information about the ofiwg mailing list