[openib-general] [PATCH/RFC 1/2] IB: Return "maybe_missed_event" hint from ib_req_notify_cq()

Wed Nov 15 10:13:15 PST 2006

Roland Dreier <rdreier at cisco.com> wrote on 11/14/2006 03:18:23 PM:

>     Shirley> The rotting packet situation consistently happens for
>     Shirley> ehca driver. The napi could poll forever with your
>     Shirley> original patch. That's the reason I defer the rotting
>     Shirley> packet process in next napi poll.
>
> Hmm, I don't see it.  In my latest patch, the poll routine does:
>
> repoll:
>    done  = 0;
>    empty = 0;
>
>    while (max) {
>       t = min(IPOIB_NUM_WC, max);
>       n = ib_poll_cq(priv->cq, t, priv->ibwc);
>
>       for (i = 0; i < n; ++i) {
>          if (priv->ibwc[i].wr_id & IPOIB_OP_RECV) {
>             ++done;
>             --max;
>             ipoib_ib_handle_rx_wc(dev, priv->ibwc + i);
>          } else
>             ipoib_ib_handle_tx_wc(dev, priv->ibwc + i);
>       }
>
>       if (n != t) {
>          empty = 1;
>          break;
>       }
>    }
>
>    dev->quota -= done;
>    *budget    -= done;
>
>    if (empty) {
>       netif_rx_complete(dev);
>       if (unlikely(ib_req_notify_cq(priv->cq,
>                      IB_CQ_NEXT_COMP |
>                      IB_CQ_REPORT_MISSED_EVENTS)) &&
>           netif_rx_reschedule(dev, 0))
>          goto repoll;
>
>       return 0;
>    }
>
>    return 1;
>
> so every receive completion will count against the limit set by the
> variable max.  The only way I could see the driver staying in the poll
> routine for a long time would be if it was only processing send
> completions, but even that doesn't actually seem bad: the driver is
> making progress handling completions.
What I have found in ehca driver, n! = t, does't mean it's empty. If poll
again, there are still some packets in cq. IB_CQ_REPORT_mISSED_EVENTS most
of the time reports 1. It relies on netif_rx_reschedule() returns 0 to exit
napi poll. That might be the reason in poll routine for a long time? I will
rerun my test to use n! = 0 to see any difference here.

>
>     Shirley> It does help the performance from 1XXMb/s to 7XXMb/s, but
>     Shirley> not as expected 3XXXMb/s.
>
> Is that 3xxx Mb/sec the performance you see without the NAPI patch?

Without NAPI patch, in my test environment ehca can gain around 2800Mb to
3000Mb/s throughput.

>     Shirley> With the defer rotting packet process patch, I can see
>     Shirley> packets out of order problem in TCP layer.  Is it
>     Shirley> possible there is a race somewhere causing two napi polls
>     Shirley> in the same time? mthca seems to use irq auto affinity,
>     Shirley> but ehca uses round-robin interrupt.
>
> I don't see how two NAPI polls could run at once, and I would expect
> worse effects from them stepping on each other than just out-of-order
> packets.  However, the fact that ehca does round-robin interrupt
> handling might lead to out-of-order packets just because different
> CPUs are all feeding packets into the network stack.
>
>  - R.
Normally for NAPI there should be only one running at a time. And NAPI
process packet all the way to TCP layer by processing packet one by one
(netif_receive_skb()). So it shouldn't lead to out-of-packets even for
round-robin interrupt handling in NAPI. I am still investing this.

Thanks
Shirley
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20061115/8772169d/attachment.html>