[ofiwg] wait sets

Wed Jan 27 13:51:03 PST 2016

> An attempt to describe the desired application behavior is:
> 
>     1. Wait for one or more events to occur
>     2. Get a list of queues that are ready for action
>     3. Process each queue until empty

I propose a couple of changes to the above sequence:

    1. If it is okay to wait
    1.1.   Wait for one or more events to occur
    2. Get list of queues ready for action
    3. Process each queue

We then define step 1.  Ba-da-bing, ba-da-boom, and we're done.

> b.)  libfabric only calls - for multiple queues
> 
>     1. fi_wait
>     2. fi_poll
>     3. fi_cq_read/fi_eq_read/fi_cntr_read

Proposal - It is okay to wait when all associated:
           CQs are empty +
           EQs are empty +
           Counters have not incremented since last wait call

There are other proposals I can think of, but are more complex, such as introducing thresholds.  Counters present a challenge.  fi_wait can encapsulate these checks.  (I'm always at a lost whether to capitalize the first word of a sentence when it refers to a name that is spelled all in lowercase.)

> c.)  OS + libfabric calls - for one or multiple queues
> This modifies the above sequence to:
> 
>     1. poll/select
>     2. fi_poll
>     3. fi_cq_read/fi_eq_read/fi_cntr_read

And we're back to the problem child.  For each fd that the app references, it should perform a check to see if it is okay to wait on that fd.  This suggests introducing new interfaces for that purpose.

fi_cq_trywait / fi_eq_trywait / fi_cntr_trywait / fi_trywait

Note that the app only needs to call trywait (names are hard) if it is directly using the fd from that object.  E.g. if it is polling on the fd from a wait set, and the wait set is associated with 4 CQs, the app only needs to call fi_trywait, not fi_cq_trywait.

The trywait calls can be static inline wrappers around existing calls (fi_cq_sread, fi_eq_sread, fi_cntr_wait, fi_wait).  Trywait would return 0 when it's safe to wait.

- Sean