[ofiwg] wait sets

Mon Feb 1 09:37:56 PST 2016

>>>    1. If it is okay to wait
>> >    1.1.   Wait for one or more events to occur
>> >    2. Get list of queues ready for action
>> >    3. Process each queue
>> >
>> > We then define step 1.  Ba-da-bing, ba-da-boom, and we're done.
>> 
>> Yes, this seems to help some.
>
>To answer Ben's question from his email, I believe that trywait is needed
>as the wait objects may be independent from the queues.  It doesn't just
>check if a wait can proceed, but must also prepare for it.  For example,
>by clearing out old events.

’m still not sure I understand the concept of ‘readiness’ in this
situation. When we talk about ‘edge triggered’ and ‘level triggered’ are
we using the same definitions as the epoll man page? If so, then does this
preparation step apply to both level and edge triggered interfaces? Also,
why should wait objects be edge triggered? It seems counter-intuitive. Is
it for performance reasons?

>
>
>> >> b.)  libfabric only calls - for multiple queues
>> >>
>> >>    1. fi_wait
>> >>    2. fi_poll
>> >>    3. fi_cq_read/fi_eq_read/fi_cntr_read
>> 
>> I take the use of read (and not sread) in b3 above implies that the wait
>> object resetting would be done at the start of fi_wait() after checking
>> for the "okay to wait" conditions immediately below?
>
>Yes - based on our github discussion, I agree that calling read here is
>preferable than sread.  I was even considering limiting sread to cases
>where a waitset is not used.

Clearly defining when an API call can be used is a good step forward. If
read is preferable and can function the same as sread in this instance,
then I’m all for limiting the sread cases.

>
>
>> > Proposal - It is okay to wait when all associated:
>> >           CQs are empty +
>> >           EQs are empty +
>> >           Counters have not incremented since last wait call
>> >
>> 
>> >> c.)  OS + libfabric calls - for one or multiple queues
>> >> This modifies the above sequence to:
>> >>
>> >>    1. poll/select
>> >>    2. fi_poll
>> >>    3. fi_cq_read/fi_eq_read/fi_cntr_read
>> >
>> > And we're back to the problem child.  For each fd that the app
>> references, it should perform a check to see if it is okay to wait on
>>that
>> fd.  This suggests introducing new interfaces for that purpose.
>> >
>> > fi_cq_trywait / fi_eq_trywait / fi_cntr_trywait / fi_trywait
>> >
>> > Note that the app only needs to call trywait (names are hard) if it is
>> directly using the fd from that object.  E.g. if it is polling on the fd
>> from a wait set, and the wait set is associated with 4 CQs, the app only
>> needs to call fi_trywait, not fi_cq_trywait.
>> >
>> > The trywait calls can be static inline wrappers around existing calls
>> (fi_cq_sread, fi_eq_sread, fi_cntr_wait, fi_wait).  Trywait would
>>return 0
>> when it's safe to wait.
>> 
>> If we end up with such an fi_trywait() call, it seems like you would
>>want
>> it to take an array of fids to avoid requiring the app to always put its
>> own loop around the calls.
>
>Ding! Ding! Ding!  I think we have a winner!
> 
>
>> Where is the wait object resetting occurring in this example?  In
>>trywait
>> (further arguing for trywait taking an array) or in an sread that must
>> come after all the reads in c3?
>
>Trywait would need to be introduced, for example:
>
>    1. if fi_trywait == 0
>    1.1.   poll/select
>    2. fi_poll
>    3. fi_cq_read/fi_eq_read/fi_cntr_read

So for API calls the reset happens at the end of fi_wait/_sread, but for
OS calls it happens at the beginning of fi_trywait?

>
>- Sean

- Ben