<html><body>

<p><tt>Roland Dreier <rdreier@cisco.com> wrote on 11/14/2006 03:18:23 PM:<br>

<br>

>     Shirley> The rotting packet situation consistently happens for<br>

>     Shirley> ehca driver. The napi could poll forever with your<br>

>     Shirley> original patch. That's the reason I defer the rotting<br>

>     Shirley> packet process in next napi poll.<br>

> <br>

> Hmm, I don't see it.  In my latest patch, the poll routine does:<br>

> <br>

> repoll:<br>

>    done  = 0;<br>

>    empty = 0;<br>

> <br>

>    while (max) {<br>

>       t = min(IPOIB_NUM_WC, max);<br>

>       n = ib_poll_cq(priv->cq, t, priv->ibwc);<br>

> <br>

>       for (i = 0; i < n; ++i) {<br>

>          if (priv->ibwc[i].wr_id & IPOIB_OP_RECV) {<br>

>             ++done;<br>

>             --max;<br>

>             ipoib_ib_handle_rx_wc(dev, priv->ibwc + i);<br>

>          } else<br>

>             ipoib_ib_handle_tx_wc(dev, priv->ibwc + i);<br>

>       }<br>

> <br>

>       if (n != t) {<br>

>          empty = 1;<br>

>          break;<br>

>       }<br>

>    }<br>

> <br>

>    dev->quota -= done;<br>

>    *budget    -= done;<br>

> <br>

>    if (empty) {<br>

>       netif_rx_complete(dev);<br>

>       if (unlikely(ib_req_notify_cq(priv->cq,<br>

>                      IB_CQ_NEXT_COMP |<br>

>                      IB_CQ_REPORT_MISSED_EVENTS)) &&<br>

>           netif_rx_reschedule(dev, 0))<br>

>          goto repoll;<br>

> <br>

>       return 0;<br>

>    }<br>

> <br>

>    return 1;<br>

> <br>

> so every receive completion will count against the limit set by the<br>

> variable max.  The only way I could see the driver staying in the poll<br>

> routine for a long time would be if it was only processing send<br>

> completions, but even that doesn't actually seem bad: the driver is<br>

> making progress handling completions.<br>

What I have found in ehca driver, n! = t, does't mean it's empty. If poll again, there are still some packets in cq. IB_CQ_REPORT_mISSED_EVENTS most of the time reports 1. It relies on netif_rx_reschedule() returns 0 to exit napi poll. That might be the reason in poll routine for a long time? I will rerun my test to use n! = 0 to see any difference here.</tt><br>

<br>

<tt>> <br>

>     Shirley> It does help the performance from 1XXMb/s to 7XXMb/s, but<br>

>     Shirley> not as expected 3XXXMb/s.<br>

> <br>

> Is that 3xxx Mb/sec the performance you see without the NAPI patch?<br>

</tt><br>

<tt>Without NAPI patch, in my test environment ehca can gain around 2800Mb to 3000Mb/s throughput.</tt><br>

<tt><br>

>     Shirley> With the defer rotting packet process patch, I can see<br>

>     Shirley> packets out of order problem in TCP layer.  Is it<br>

>     Shirley> possible there is a race somewhere causing two napi polls<br>

>     Shirley> in the same time? mthca seems to use irq auto affinity,<br>

>     Shirley> but ehca uses round-robin interrupt.<br>

> <br>

> I don't see how two NAPI polls could run at once, and I would expect<br>

> worse effects from them stepping on each other than just out-of-order<br>

> packets.  However, the fact that ehca does round-robin interrupt<br>

> handling might lead to out-of-order packets just because different<br>

> CPUs are all feeding packets into the network stack.<br>

> <br>

>  - R.<br>

Normally for NAPI there should be only one running at a time. And NAPI process packet all the way to TCP layer by processing packet one by one (netif_receive_skb()). So it shouldn't lead to out-of-packets even for round-robin interrupt handling in NAPI. I am still investing this.</tt><br>

<br>

<tt>Thanks</tt><br>

<tt>Shirley</tt></body></html>