[openib-general] IPoIB performance.
Roland Dreier
roland at topspin.com
Mon Dec 27 08:21:13 PST 2004
Ido> 1. We can divide the single CQ into two separate completion
Ido> queues: one for the RQ and the other for SQ. Then we can
Ido> change the CQ policy affiliated with the SQ into
Ido> IB_CQ_CONSUMER_REARM and in mainstream not arm the CQ. In
Ido> such case the poll_cq_tq will be called from the send_packet
Ido> method, and will reap completions without any need for
Ido> interrupts/events. Obviously in cases when we have to stop
Ido> the queue ( e.g. no more room available) , we need to arm the
Ido> CQ until completions arrive. In general, this change reduces
Ido> the interrupt rate. It may also help when posting and polling
Ido> on the SQ, happens from two different processors
Ido> (e.g. spinlock clash).
I guess you also need to set a timer to poll the send queue so that
you eventually get a completion for all sends, even when a packet is
sent and another one isn't sent for a while.
Also we would need to try a variety of workloads, because splitting
the send and receive CQs means that we will always have to lock the CQ
twice to poll sends and completions. For example "NPtcp -2" would be
a useful test.
Ido> 2. The current IPoIB driver is signaling every transmitting
Ido> packet. We can improve performance by selective signaling
Ido> )e.g. every 5 to 10 packets(. Note that we did notice
Ido> several problems when doing it. This approach can have a
Ido> problem in case of for example ping (ICMP) which do not
Ido> allocate new buffers before the first ones are released. I
Ido> can think of some WA to this problem such as send a dummy
Ido> packet every now or then (which won't go out of the device).
Ido> this includes changing the send policy to be
Ido> IB_WQ_SIGNAL_SELECTABLE. When a selective WQE completes, the
Ido> ipoib driver has to internally complete the rest of WQEs that
Ido> weren't signaled. This change mainly reduces the overhead
Ido> required by the HW driver to poll on the completion queue.
I've looked at this in the past as well. As you point out, the kernel
needs the destructor of an skb being sent to be called before it can
free space in a socket buffer, so you would need to be very clever
here. I'll be curious to see your approach.
Ido> 3. I think we should call netif_wake_queue only if the queue
Ido> is actually stopped because as far as I understand, the
Ido> kernel can schedule another process after calling wake queue.
Thanks for pointing this out (although of course netif_wake_queue and
__netif_schedule don't actually schedule a new process -- since we're
calling netif_wake_queue from interrupt context they couldn't in any
case). The expensive thing seems to be the clearing and restoring
interrupts in __netif_schedule. In any case I committed the change
below, which seems to be about a 3% improvement.
Ido> I have tried these changes and got ~20% improvement in
Ido> performance )2 Tavor machines with dual CPU*3.1 GH (Xeon),
Ido> AS3.0 U3) If you find it interesting, I can work on a patch
Ido> for gen2.
If you write a patch and do some benchmarking, that would be great.
Thanks,
Roland
Index: ulp/ipoib/ipoib_ib.c
===================================================================
--- ulp/ipoib/ipoib_ib.c (revision 1383)
+++ ulp/ipoib/ipoib_ib.c (working copy)
@@ -249,7 +249,8 @@
spin_lock_irqsave(&priv->tx_lock, flags);
++priv->tx_tail;
- if (priv->tx_head - priv->tx_tail <= IPOIB_TX_RING_SIZE / 2)
+ if (netif_queue_stopped(dev) &&
+ priv->tx_head - priv->tx_tail <= IPOIB_TX_RING_SIZE / 2)
netif_wake_queue(dev);
spin_unlock_irqrestore(&priv->tx_lock, flags);
More information about the general
mailing list