[openib-general] [PATCH] IB/ipoib: NAPI
Roland Dreier
rdreier at cisco.com
Thu Sep 21 11:07:30 PDT 2006
Looks pretty good. I took a stab at implementing this myself, and it
seems we came to the same conclusion: for generic HCAs that have a
race between request notify and poll CQ, there is no alternative
except "peek CQ".
However I don't think we want to use peek CQ always -- I think that
extra CQ lock/unlock may kill a lot of the performance gain you see
with NAPI (and I don't think even mthca can do a lockless CQ peek,
since we need to protect against races with resize CQ, etc). So
probably what we need is a feature bit in the struct ib_device to say
whether the peek CQ is needed or whether req notify will generate
events for existing CQEs.
You might want to respin your patch against my for-2.6.19 branch -- I
already split up the handle WC routine into separate send and receive
functions, so the patch will become much smaller.
Also, the handling of how many completions to poll and the logic of
when to call netif_rx_complete() seems very strange to me. First, you
totally ignore the budget parameter, so you may end up doing more work
than the networking upper layers want. Second, you often leave the
poll routine to run one more time, even if you've drained the CQ
without using up your work quota.
My poll routine ended up looking like the following, which I think is
more correct:
+int ipoib_poll(struct net_device *dev, int *budget)
+{
+ struct ipoib_dev_priv *priv = netdev_priv(dev);
+ int max = min(*budget, dev->quota);
+ int done = 0;
+ int t;
+ int empty = 0;
+ int n, i;
+
+ while (max) {
+ t = min(IPOIB_NUM_WC, max);
+ n = ib_poll_cq(priv->cq, t, priv->ibwc);
+
+ for (i = 0; i < n; ++i) {
+ if (priv->ibwc[i].wr_id & IPOIB_OP_RECV) {
+ ++done;
+ --max;
+ ipoib_ib_handle_rx_wc(dev, priv->ibwc + i);
+ } else
+ ipoib_ib_handle_tx_wc(dev, priv->ibwc + i);
+ }
+
+ if (n != t) {
+ empty = 1;
+ break;
+ }
+ }
+
+ dev->quota -= done;
+ *budget -= done;
+
+ if (empty) {
+ netif_rx_complete(dev);
+ ib_req_notify_cq(priv->cq, IB_CQ_NEXT_COMP);
+ /* XXX rotting packet! */
+ return 0;
+ }
+
+ return 1;
+}
More information about the general
mailing list