[ofa-general] [PATCH] IB/mthca: work around kernel QP starvation

Michael S. Tsirkin mst at mellanox.co.il
Thu Apr 12 08:10:25 PDT 2007


It turns out that with mthca, reliable QPs might starve each other,
and even UD QPs on the same schedule queue.
As a result, we observed userspace MPI starving e.g. IPoIB traffic,
with netdev watchdog warnings getting printed out, and TCP
connections getting stuck or failing.

Reduce the chance of this happening by separating reliable QPs,
as well as userspace and kernel QPs to different hardware schedule queues.

Signed-off-by: Michael S. Tsirkin <mst at dev.mellanox.co.il>

---

Roland, this fixes a problem we see on large clusters which mix openmpi and
ipoib. Could this be queued for 2.6.21?

Index: linux-2.6/drivers/infiniband/hw/mthca/mthca_qp.c
===================================================================
--- linux-2.6.orig/drivers/infiniband/hw/mthca/mthca_qp.c
+++ linux-2.6/drivers/infiniband/hw/mthca/mthca_qp.c
@@ -701,6 +701,19 @@ int mthca_modify_qp(struct ib_qp *ibqp, 
 		qp_param->opt_param_mask |= cpu_to_be32(MTHCA_QP_OPTPAR_PRIMARY_ADDR_PATH);
 	}
 
+	if (ibqp->qp_type == IB_QPT_RC &&
+	    cur_state == IB_QPS_INIT && new_state == IB_QPS_RTR) {
+		u8 sched_queue = ibqp->uobject ? 0x2 : 0x1;
+
+		if (mthca_is_memfree(dev))
+			qp_context->rlkey_arbel_sched_queue |= sched_queue;
+		else
+			qp_context->tavor_sched_queue |= cpu_to_be32(sched_queue);
+
+		qp_param->opt_param_mask |=
+			cpu_to_be32(MTHCA_QP_OPTPAR_SCHED_QUEUE);
+	}
+
 	if (attr_mask & IB_QP_TIMEOUT) {
 		qp_context->pri_path.ackto = attr->timeout << 3;
 		qp_param->opt_param_mask |= cpu_to_be32(MTHCA_QP_OPTPAR_ACK_TIMEOUT);

-- 
Michael S. Tsirkin - Staff Engineer, Mellanox Technologies Ltd.
Eternity is a very long time, especially towards the end.



More information about the general mailing list