[openib-general] [PATCH/RFC 1/2] IB: Return "maybe_missed_event" hint from ib_req_notify_cq()

Roland Dreier rdreier at cisco.com
Tue Nov 14 15:18:23 PST 2006


    Shirley> The rotting packet situation consistently happens for
    Shirley> ehca driver. The napi could poll forever with your
    Shirley> original patch. That's the reason I defer the rotting
    Shirley> packet process in next napi poll.

Hmm, I don't see it.  In my latest patch, the poll routine does:

repoll:
	done  = 0;
	empty = 0;

	while (max) {
		t = min(IPOIB_NUM_WC, max);
		n = ib_poll_cq(priv->cq, t, priv->ibwc);

		for (i = 0; i < n; ++i) {
			if (priv->ibwc[i].wr_id & IPOIB_OP_RECV) {
				++done;
				--max;
				ipoib_ib_handle_rx_wc(dev, priv->ibwc + i);
			} else
				ipoib_ib_handle_tx_wc(dev, priv->ibwc + i);
		}

		if (n != t) {
			empty = 1;
			break;
		}
	}

	dev->quota -= done;
	*budget    -= done;

	if (empty) {
		netif_rx_complete(dev);
		if (unlikely(ib_req_notify_cq(priv->cq,
					      IB_CQ_NEXT_COMP |
					      IB_CQ_REPORT_MISSED_EVENTS)) &&
		    netif_rx_reschedule(dev, 0))
			goto repoll;

		return 0;
	}

	return 1;

so every receive completion will count against the limit set by the
variable max.  The only way I could see the driver staying in the poll
routine for a long time would be if it was only processing send
completions, but even that doesn't actually seem bad: the driver is
making progress handling completions.

    Shirley> It does help the performance from 1XXMb/s to 7XXMb/s, but
    Shirley> not as expected 3XXXMb/s.

Is that 3xxx Mb/sec the performance you see without the NAPI patch?

    Shirley> With the defer rotting packet process patch, I can see
    Shirley> packets out of order problem in TCP layer.  Is it
    Shirley> possible there is a race somewhere causing two napi polls
    Shirley> in the same time? mthca seems to use irq auto affinity,
    Shirley> but ehca uses round-robin interrupt.

I don't see how two NAPI polls could run at once, and I would expect
worse effects from them stepping on each other than just out-of-order
packets.  However, the fact that ehca does round-robin interrupt
handling might lead to out-of-order packets just because different
CPUs are all feeding packets into the network stack.

 - R.




More information about the general mailing list