[ewg] [PATCH] ipoib: Fix lockup of the tx queue

Eli Cohen eli at mellanox.co.il
Wed Mar 3 04:27:52 PST 2010


The ipoib UD QP reports send completions to priv->send_cq which is unarmed
generally; it only gets armed when the number of outstanding send requests
(e.g. those for which a completion was not polled yet) reaches the size of the
tx queue. This arming (done using ib_req_notify_cq()) is done only in the send
path for the UD QP. However, when sending CM packets, the net queue may be
stopped for the same reasons but no measures are taken to recover the UD path
from a lockup.
Consider this scenario: a host sends high rate of both CM and UD packets.
Suppose also that the tx queue length is N. If at some time the number of
outstanding UD packets is more than N/2 and the overall outstanding packets is
N-1, and now CM sends a packet making the number of outstanding equal N, the tx
queue will be stopped. When all the CM packets will complete, the number of
outstanding packets will still be higher than N/2 so the tx queue will not be
enabled.
Fix this by calling ib_req_notify_cq() when the queue is stopped in the CM
path.

Signed-off-by: Eli Cohen <eli at mellanox.co.il>
---
 drivers/infiniband/ulp/ipoib/ipoib_cm.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index 30bdf42..f8302c2 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -752,6 +752,8 @@ void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_
 		if (++priv->tx_outstanding == ipoib_sendq_size) {
 			ipoib_dbg(priv, "TX ring 0x%x full, stopping kernel net queue\n",
 				  tx->qp->qp_num);
+			if (ib_req_notify_cq(priv->send_cq, IB_CQ_NEXT_COMP))
+				ipoib_warn(priv, "request notify on send CQ failed\n");
 			netif_stop_queue(dev);
 		}
 	}
-- 
1.7.0




More information about the ewg mailing list