[openib-general] [PATCH] IB/ipoib: NAPI
Rimmer, Todd
trimmer at silverstorm.com
Mon Sep 25 09:27:23 PDT 2006
> From: Michael S. Tsirkin
> Sent: Monday, September 25, 2006 10:54 AM
> To: Roland Dreier
> Cc: openib-general at openib.org
> Subject: Re: [openib-general] [PATCH] IB/ipoib: NAPI
>
> Quoting r. Roland Dreier <rdreier at cisco.com>:
> > Subject: Re: [PATCH] IB/ipoib: NAPI
> >
> > Michael> Actually, the reason it is hard to come up with the
name
> > Michael> is that what this enables is the natural poll/request
> > Michael> notification order.
> >
> > Over the weekend and I thought about this and I came up with an idea
I
> > kind of like, inspired by Todd Rimmer's comments about
poll-and-notify.
> >
> > We could change ib_req_notify_cq() to have an extra parameter:
> >
> > static inline int ib_req_notify_cq(struct ib_cq *cq,
> > enum ib_cq_notify cq_notify,
> > int *lost_event_possible)
> >
> > and if non-NULL is passed in for lost_event_possible, then
> > req_notify_cq should do the equivalent of a CQ peek after arming the
> > CQ event.
>
> I thought about this too.
>
> But this has a disadvantage over the device-wide flag: when flag is
> device-wide,
> we can just have 2 polling routines - with and without peek - and
select
> the
> correct one at device open depending on the hardware capabilities.
> Thus we can avoid a conditional branch on the fast path,
> which I think is nice.
>
> So I think if we want to enable mthca-specific optimization,
> the righ tway is with device flags.
>
> On a separate note - ib_req_notify_cq is also testing the
> lost_event_possible flag -
> so now we have 2 conditional branches on fast path, and this hurts all
> ULPs. Ugh.
>
> If we extend the interface, I would rather make a new call
> ib_req_notify_and_peek_cq(truct ib_cq *cq, enum ib_cq_notify
> cq_notify)
> that returns 0 on empty CQ, 1 on non-empty and negative on error.
>
> --
> MST
>
Its inefficient to peek the CQ if the next operation is likely to then
be a poll. Performing the poll_and_notify in one call is more
efficient.
Then if you use poll_and_notify instead of poll_cq in the polling loops,
you can also be equally efficient for all HCA models without needing a
hardware capability flag and 2 polling algorithms in each ULP. Instead
the HCA driver naturally provides the most efficient approach and all
callers use the same algorithm.
In the examples below, lets assume 2 CQEs are returned, then its rearmed
and CQ is still empty afterward.
For example on Mellanox HCAs the actual sequence would be:
poll_and_notify
returns a CQE, tells caller to call it again
poll_and_notify
returns a CQE, tells caller to call it again
poll_and_notify
finds CQ empty, rearms CQ, tells caller its done [note
no peek needed]
3 Driver calls, 3 CQE access, 1 rearm
For other HCAs the actual sequence would be:
poll_and_notify
returns a CQE, tells caller to call it again
poll_and_notify
returns a CQE, tells caller to call it again
poll_and_notify
finds CQ empty, rearms CQ, peeks CQ
if CQ empty, tells caller its done [for this example,
its true]
if CQ not empty, tells caller to loop on poll_cq
3 Driver calls, 4 CQE access, 1 rearm
In comparison the present code (or with a device capability flag) is:
poll_cq
returns a CQE
poll_cq
returns a CQE
poll_cq
finds CQ empty
notify_cq
rearms CQ
if non-Mellanox HCA
poll_cq - finds CQ empty
4-5 Driver calls, 3-4 CQE access, 1 rearm
With notify with an internal peek (lost event flag approach) its:
poll_cq
returns a CQE
poll_cq
returns a CQE
poll_cq
finds CQ empty
notify_cq
rearms CQ, for non-mellanox HCA, peeks CQ - finds CQ
empty
if lost events indicated [for this example its false]
poll_cq til empty
4-5 Driver calls, 3-4 CQE access, 1 rearm
Hence for all HCA models, the poll_and_notify approach has fewer driver
calls. (3 in above example, compared to 4 for other approaches).
In general driver calls are going to be the expensive factor in this
comparison. The main difference in all the above examples will be the
spin_lock for the CQ.
Depending on HCA design, the poll_cq and/or notify_cq and/or peek_cq
operations may also incur an expensive PCI bus read or write. However,
with the exception of the notify w/peek approach, those costs are the
same for all the above examples.
In the case (not shown above) where there was 1 additional CQE found
after the rearm [applicable only to non-Mellanox HCAs], the
poll_and_notify approach will also save 1 CQE access as compared to the
notify w/internal peek approach.
Todd Rimmer
More information about the general
mailing list