[ofiwg] wait sets
Hefty, Sean
sean.hefty at intel.com
Thu Jan 28 22:13:32 PST 2016
> > 1. If it is okay to wait
> > 1.1. Wait for one or more events to occur
> > 2. Get list of queues ready for action
> > 3. Process each queue
> >
> > We then define step 1. Ba-da-bing, ba-da-boom, and we're done.
>
> Yes, this seems to help some.
To answer Ben's question from his email, I believe that trywait is needed as the wait objects may be independent from the queues. It doesn't just check if a wait can proceed, but must also prepare for it. For example, by clearing out old events.
> >> b.) libfabric only calls - for multiple queues
> >>
> >> 1. fi_wait
> >> 2. fi_poll
> >> 3. fi_cq_read/fi_eq_read/fi_cntr_read
>
> I take the use of read (and not sread) in b3 above implies that the wait
> object resetting would be done at the start of fi_wait() after checking
> for the "okay to wait" conditions immediately below?
Yes - based on our github discussion, I agree that calling read here is preferable than sread. I was even considering limiting sread to cases where a waitset is not used.
> > Proposal - It is okay to wait when all associated:
> > CQs are empty +
> > EQs are empty +
> > Counters have not incremented since last wait call
> >
>
> >> c.) OS + libfabric calls - for one or multiple queues
> >> This modifies the above sequence to:
> >>
> >> 1. poll/select
> >> 2. fi_poll
> >> 3. fi_cq_read/fi_eq_read/fi_cntr_read
> >
> > And we're back to the problem child. For each fd that the app
> references, it should perform a check to see if it is okay to wait on that
> fd. This suggests introducing new interfaces for that purpose.
> >
> > fi_cq_trywait / fi_eq_trywait / fi_cntr_trywait / fi_trywait
> >
> > Note that the app only needs to call trywait (names are hard) if it is
> directly using the fd from that object. E.g. if it is polling on the fd
> from a wait set, and the wait set is associated with 4 CQs, the app only
> needs to call fi_trywait, not fi_cq_trywait.
> >
> > The trywait calls can be static inline wrappers around existing calls
> (fi_cq_sread, fi_eq_sread, fi_cntr_wait, fi_wait). Trywait would return 0
> when it's safe to wait.
>
> If we end up with such an fi_trywait() call, it seems like you would want
> it to take an array of fids to avoid requiring the app to always put its
> own loop around the calls.
Ding! Ding! Ding! I think we have a winner!
> Where is the wait object resetting occurring in this example? In trywait
> (further arguing for trywait taking an array) or in an sread that must
> come after all the reads in c3?
Trywait would need to be introduced, for example:
1. if fi_trywait == 0
1.1. poll/select
2. fi_poll
3. fi_cq_read/fi_eq_read/fi_cntr_read
- Sean
More information about the ofiwg
mailing list