<html><body>
<p><tt>Roland Dreier <rdreier@cisco.com> wrote on 11/14/2006 03:18:23 PM:<br>
<br>
> Shirley> The rotting packet situation consistently happens for<br>
> Shirley> ehca driver. The napi could poll forever with your<br>
> Shirley> original patch. That's the reason I defer the rotting<br>
> Shirley> packet process in next napi poll.<br>
> <br>
> Hmm, I don't see it. In my latest patch, the poll routine does:<br>
> <br>
> repoll:<br>
> done = 0;<br>
> empty = 0;<br>
> <br>
> while (max) {<br>
> t = min(IPOIB_NUM_WC, max);<br>
> n = ib_poll_cq(priv->cq, t, priv->ibwc);<br>
> <br>
> for (i = 0; i < n; ++i) {<br>
> if (priv->ibwc[i].wr_id & IPOIB_OP_RECV) {<br>
> ++done;<br>
> --max;<br>
> ipoib_ib_handle_rx_wc(dev, priv->ibwc + i);<br>
> } else<br>
> ipoib_ib_handle_tx_wc(dev, priv->ibwc + i);<br>
> }<br>
> <br>
> if (n != t) {<br>
> empty = 1;<br>
> break;<br>
> }<br>
> }<br>
> <br>
> dev->quota -= done;<br>
> *budget -= done;<br>
> <br>
> if (empty) {<br>
> netif_rx_complete(dev);<br>
> if (unlikely(ib_req_notify_cq(priv->cq,<br>
> IB_CQ_NEXT_COMP |<br>
> IB_CQ_REPORT_MISSED_EVENTS)) &&<br>
> netif_rx_reschedule(dev, 0))<br>
> goto repoll;<br>
> <br>
> return 0;<br>
> }<br>
> <br>
> return 1;<br>
> <br>
> so every receive completion will count against the limit set by the<br>
> variable max. The only way I could see the driver staying in the poll<br>
> routine for a long time would be if it was only processing send<br>
> completions, but even that doesn't actually seem bad: the driver is<br>
> making progress handling completions.<br>
What I have found in ehca driver, n! = t, does't mean it's empty. If poll again, there are still some packets in cq. IB_CQ_REPORT_mISSED_EVENTS most of the time reports 1. It relies on netif_rx_reschedule() returns 0 to exit napi poll. That might be the reason in poll routine for a long time? I will rerun my test to use n! = 0 to see any difference here.</tt><br>
<br>
<tt>> <br>
> Shirley> It does help the performance from 1XXMb/s to 7XXMb/s, but<br>
> Shirley> not as expected 3XXXMb/s.<br>
> <br>
> Is that 3xxx Mb/sec the performance you see without the NAPI patch?<br>
</tt><br>
<tt>Without NAPI patch, in my test environment ehca can gain around 2800Mb to 3000Mb/s throughput.</tt><br>
<tt><br>
> Shirley> With the defer rotting packet process patch, I can see<br>
> Shirley> packets out of order problem in TCP layer. Is it<br>
> Shirley> possible there is a race somewhere causing two napi polls<br>
> Shirley> in the same time? mthca seems to use irq auto affinity,<br>
> Shirley> but ehca uses round-robin interrupt.<br>
> <br>
> I don't see how two NAPI polls could run at once, and I would expect<br>
> worse effects from them stepping on each other than just out-of-order<br>
> packets. However, the fact that ehca does round-robin interrupt<br>
> handling might lead to out-of-order packets just because different<br>
> CPUs are all feeding packets into the network stack.<br>
> <br>
> - R.<br>
Normally for NAPI there should be only one running at a time. And NAPI process packet all the way to TCP layer by processing packet one by one (netif_receive_skb()). So it shouldn't lead to out-of-packets even for round-robin interrupt handling in NAPI. I am still investing this.</tt><br>
<br>
<tt>Thanks</tt><br>
<tt>Shirley</tt></body></html>