[openib-general] [PATCH][3/7]ipoib performance patches -- remove tx_ring
Roland Dreier
rdreier at cisco.com
Fri May 26 15:48:46 PDT 2006
Shirley> This patch reduces one extra ring between dev xmit queue
Shirley> and device send queue and removes tx_lock in completion
Shirley> handler.
I don't think replacing the ring with a linked list is an
improvement. At best it works out to be the same.
But I didn't notice that you removed the tx_lock from the send
completion handler before. I suspect that this is where all the
improvement is coming from. Unfortunately it introduces a couple of
race conditions. For example, if CPU1 is in ipoib_send() and CPU2 is
running the send completion handler, you can have:
CPU1 CPU2
err = post_send(priv, wr_id, address->ah, qpn, addr, skb->len);
// err is non-zero
if (!err) {
} else {
if (!netif_queue_stopped(dev)) {
// send completion handler
// runs many times and drains
// completion queue
// queue is never stopped so
// it never gets woken again
if (netif_queue_stopped(dev))
netif_wake_queue(dev);
netif_stop_queue(dev);
// no more sends are posted so
// another completion never
// occurs and the queue stays
// stopped forever.
There are analogous races that cause spurious wakeups too.
However I think this is a good idea, and I think we can get around the
races by working around them in a way that doesn't hurt performance
(eg by having the netdev tx watchdog restart the queue in the scenario
above). I cooked up a quick patch (below) but I didn't see any
performance improvement from this in my quick tests -- NPtcp peaks at
~3400 Mbit/sec between my test systems with 2.6.17-rc5 both with and
without this patch.
Then I tried to apply your patches 1/7 and 3/7 to see if they were any
different on my setup, but the patches didn't apply. 3/7 had no
attachment, and it's hopeless to try and apply the patches you send
inline. So can you resend up-to-date versions of the patches that
give you a 10% improvement? (BTW it would be nice if you could figure
out a way to fix your mail client to post patches inline without
mangling them, or at least attach them with a mime type of text/plain
or something)
Also, if you're interested, you could try the patch below and see how
it does on your tests.
- R.
Here's my tx_lock removal test patch (it would need better comments
about all the races with tx_tail and some more careful review before
we actually apply it):
--- infiniband/ulp/ipoib/ipoib_main.c (revision 7507)
+++ infiniband/ulp/ipoib/ipoib_main.c (working copy)
@@ -634,6 +634,14 @@ static int ipoib_start_xmit(struct sk_bu
return NETDEV_TX_BUSY;
}
+ /*
+ * Because tx_lock is not held when updating tx_tail in the
+ * send completion handler, we may receive a spurious wakeup
+ * that starts our queue when there really isn't space yet.
+ */
+ if (unlikely(priv->tx_head - priv->tx_tail == ipoib_sendq_size))
+ return NETDEV_TX_BUSY;
+
if (skb->dst && skb->dst->neighbour) {
if (unlikely(!*to_ipoib_neigh(skb->dst->neighbour))) {
ipoib_path_lookup(skb, dev);
@@ -703,6 +711,21 @@ static struct net_device_stats *ipoib_ge
static void ipoib_timeout(struct net_device *dev)
{
struct ipoib_dev_priv *priv = netdev_priv(dev);
+ unsigned long flags;
+ int lost_wakeup = 0;
+
+ spin_lock_irqsave(&priv->tx_lock, flags);
+ if (netif_queue_stopped(dev) &&
+ priv->tx_head - priv->tx_tail < ipoib_sendq_size) {
+ ipoib_dbg(priv, "lost wakeup, head %u, tail %u\n",
+ priv->tx_head, priv->tx_tail);
+ lost_wakeup = 1;
+ netif_wake_queue(dev);
+ }
+ spin_unlock_irqrestore(&priv->tx_lock, flags);
+
+ if (lost_wakeup)
+ return;
ipoib_warn(priv, "transmit timeout: latency %d msecs\n",
jiffies_to_msecs(jiffies - dev->trans_start));
--- infiniband/ulp/ipoib/ipoib_ib.c (revision 7511)
+++ infiniband/ulp/ipoib/ipoib_ib.c (working copy)
@@ -244,7 +244,6 @@ static void ipoib_ib_handle_wc(struct ne
} else {
struct ipoib_tx_buf *tx_req;
- unsigned long flags;
if (wr_id >= ipoib_sendq_size) {
ipoib_warn(priv, "completion event with wrid %d (> %d)\n",
@@ -266,12 +265,17 @@ static void ipoib_ib_handle_wc(struct ne
dev_kfree_skb_any(tx_req->skb);
- spin_lock_irqsave(&priv->tx_lock, flags);
++priv->tx_tail;
+
+ /*
+ * Since we don't hold tx_lock here, this may lead to
+ * both lost wakeups (which we deal with in our
+ * watchdog) and spurious wakeups (which we deal with
+ * by handling TX ring overflows in the xmit function).
+ */
if (netif_queue_stopped(dev) &&
priv->tx_head - priv->tx_tail <= ipoib_sendq_size >> 1)
netif_wake_queue(dev);
- spin_unlock_irqrestore(&priv->tx_lock, flags);
if (wc->status != IB_WC_SUCCESS &&
wc->status != IB_WC_WR_FLUSH_ERR)
More information about the general
mailing list