[openib-general][PATCH][kdapl]: evd upcall policy implementation

Caitlin Bestler caitlin.bestler at gmail.com
Mon Aug 22 17:03:28 PDT 2005


If your EVD typically has hundreds of entries, let alone
thousands, then you should be notification driven. You
should be time-driven and simply use evd_dequeue().

The whole point of having control of notifications is to
ensure that upcalls are used to wake up the consumer.
If there is no need to be woken, because the EVD is
never drained, then why use notifications at all?

But in any event, if the application enables for a SINGLE
upcall then it cannot be "overwhelmed" by upcalls. It
can process as many events from the EVD that it wants.

When it is done it has two choices:

a) If it does not want to continue working on this EVD
    it can disable the EVD and simply cause dequeing
    to resume at a later time based on a clock (indpenedent
    of the number of events in the EVD).

b) Allow the Provider to decide whether it should continue
   by enabling the EVD and exiting. If there is no reason not
   to reschedule the Provider will upcall again, or it might wait.


On 8/18/05, Guy German <guyg at voltaire.com> wrote:
> > Yes.
> > dat_evd_modify_upcall has been called, but the current
> > upcall instance has not yet returned. During this period
> > the consumer should check to see if the EVD is drained.
> > If so, the consumer is no longer notified (re this EVD).
> 
> I don't follow you - if the consumer is still in the upcall
> context, why should he be changing the upcall policy at all ?
> (assuming It's a single instance)
> 
> Any way, I don't think it is recommended to "drain the
> evd", in the upcall's tasklet/interrupt context.
> There can be thousends of events to dequeue, and
> while you drain them, there can be more comming.
> You want to get out of that context as fast as possible.
> 
> Guy
> 
> On 8/18/05, Guy German <guyg at voltaire.com> wrote:
> > Hi Caitlin,
> > Caitlin Bestler <mailto:caitlin.bestler at gmail.com> wrote:
> > > Some clarifications are needed here.
> > >
> > > First the Consumer is responsible for draining the
> > > EVD after re-enabling it, or at least for remembering
> > > that there may be undrained notified events.
> >
> > Can you please explain what you mean by "re-enabling"
> > the EVD ? Do you mean calling dat_evd_modify_upcall
> > and changing the upcall policy from disable, back to
> > enable ?
> >
> > >
> > > That is "you-have-been-notified" is a sticky boolean
> > > attribute that the Consumer is supposed to set to TRUE
> > > when the upcall is made and only clear when the EVD
> > > has been drained *after* re-enabling.
> > >
> > > Second, is that the EVD is first and foremost an event
> > > *serializer*. It is presumed to have a finite number of
> > > resources for making upcalls (at most one for the typical
> > > case where SINGLE is enabled). The next upcall per
> > > resource CANNOT occur until after the current upcall
> > > has completed.
> > >
> > > Whether this should be solved in the DAT Provider is
> > > a question of what the verb-layer provider is allowed
> > > to do. If the verb layer provider can in fact generate
> > > multiple concurrent upcalls for the same CQ then the
> > > EVD itself must guard against re-entrancy.
> > >
> > > A more likely implementation is that upcalls triggered
> > > by post_se, CM events and CQs could theoretically
> > > occur at the same instance -- but that none of these
> > > paths can be re-entrant by themselves.
> > >
> > > Once the potential re-entrancy from the verb layer
> > > is known, then an optimal strategy can be selected.
> > > For exaple, if the only potential re-entrancy comes
> > > when the upcall interrupts a post_se call then some
> > > simple critical regions can avoid all problems without
> > > general purpose spinlocks or semaphores.
> > >
> > > On 8/16/05, James Lentini <jlentini at netapp.com> wrote:
> > >>
> > >>
> > >> On Tue, 16 Aug 2005, Guy German wrote:
> > >>
> > >>>>>>>>> Also, the pending_event_queue is only used for kDAPL generated
> > >>>>>>>>> software events. This queue can be empty when there are
> > >>>>>>>>> events on the CQ, so your would need to be expanded your
> > >>>>>>>>> check to cover that.
> > >>>>>>>
> > >>>>>>> Actually, even though, I agreed before, I tend to disagree now.
> > >>>>>>> The consumer will still get the DTO events as soon as the CQ
> > >>>>>>> upcall is triggered (enabled), so only problem is with the
> > >>>>>>> pending events list.
> > >>>>>>
> > >>>>>> Why is it an error for the consumer to modify the upcall policy
> > >>>>>> when there are pending events?
> > >>>>>>
> > >>>>>> dat_evd_modify_upcall should behave just like the IBTA spec's
> > >>>>>> Request Completion Notification verb in this respect. If there
> > >>>>>> were events on the EVD before the upcall is enabled, no upcall
> > >>>>>> needs to be generated. A correct consumer can easily work around
> > >>>>>> this by enabling the upcall and polling the EVD one final time
> > >>>>>> to ensure it is empty.
> > >>>>>
> > >>>>> There can be more than one event, and the consumer would need to
> > >>>>> dequeue many times. While the consumer would do his extra
> > >>>>> dequeue-ing he might also get an upcall, because his policy is
> > >>>>> now enabled. I can't think of a design that can handle such a
> > >>>>> case, and if there is one it is demanding and complicated, from
> > >>>>> the consumers side.
> > >>>>
> > >>>> Isn't it the same position all event code written to the OpenIB
> > >>>> API is in?
> > >>>
> > >>> I don't quite know what you are reffering to, but if you are
> > >>> reffering to the case of cq in IB - It's totally different: you
> > >>> only enable the cq once, so you will only get one upcall, and the
> > >>> rest of the events you will need to dequeue.
> > >>
> > >> The consumer should only receive one upcall at a time if the upcall
> > >> policy is DAT_UPCALL_SINGLE_INSTANCE. If the dequeues are performed
> > >> in an upcall, the logic needed in an OpenIB consumer and kDAPL
> > >> consumer is essentially the same.
> > >>
> > >> The difference is that the OpenIB consumer needs to re-enable the CQ
> > >> upcall and poll to make sure no events were missed.
> > >>
> > >>>> I agree with you that this programming model is difficult to use,
> > >>>> but I don't think it is impossible.
> > >>>
> > >>> I think it is a bad idea to dequeue events and at the same time
> > >>> receive upcalls from the same queue. It is racy, and has bad
> > >>> performance. I don't see *any* reason to do it.
> > >>
> > >> The current kDAPL implementation does create a situation in which an
> > >> upcall and poll occur simultaneously if the upcall is disabled, the
> > >> consumer enables the upcall, and then the consumer does a poll. In
> > >> this scenario an upcall can occur while the consumer is polling. I
> > >> was pointing out that this same race exists in the OpenIB verbs API
> > >> (and the IBTA verbs).
> > >>
> > >> Again, I agree that we can eliminate the additional poll after
> > >> enabling the upcall in kDAPL. We just need to do it in a way that is
> > >> not hardware specific. I believe we can use the same technique we
> > >> did in the DTO upcall.
> > >>
> > >> james
> > >> _______________________________________________
> > >> openib-general mailing list
> > >> openib-general at openib.org
> > >> http://openib.org/mailman/listinfo/openib-general
> > >>
> > >> To unsubscribe, please visit
> > http://openib.org/mailman/listinfo/openib-general
> >
> 
>



More information about the general mailing list