[ofa-general] mthca max_sge value... ugh.

Tue May 20 14:02:18 PDT 2008

> Then, we get into the complexity of sanity checking in create_qp (since we should
 > be able to use the value returned by create-qp when calling create-qp, and get
 > the same result). Essentially, we will need to check the requested sge numbers
 > per QP type, whether it is for send or receive, etc. IMHO, this gets nasty very
 > quickly -- creates a problem with support -- users will need a "roadmap" for create-qp.

Actually it seems pretty easy to understand -- returned max_sge is the
largest value that is guaranteed to work.  If it happens that the
requested QP gives more capabilities "for free" then the driver will
tell you in the returned structure.  But whatever.

 > I much prefer to treat the query_hca returned values as absolute maxima, and enforce
 > these limits (although this is at the expense of additional s/g entries for some
 > qp types and send/receive).

OK, I added the patch below to fix this mlx4 bug without returning any
s/g entries beyond what the device returns.

commit cd155c1c7c9e64df6afb5504d292fef7cb783a4f
Author: Roland Dreier <rolandd at cisco.com>
Date:   Tue May 20 14:00:02 2008 -0700

    IB/mlx4: Fix creation of kernel QP with max number of send s/g entries
    
    When creating a kernel QP where the consumer asked for a send queue
    with lots of scatter/gater entries, set_kernel_sq_size() incorrectly
    returned an error if the send queue stride is larger than the
    hardware's maximum send work request descriptor size.  This is not a
    problem; the only issue is to make sure that the actual descriptors
    used do not overflow the maximum descriptor size, so check this instead.
    
    Clamp the returned max_send_sge value to be no bigger than what
    query_device returns for the max_sge to avoid confusing hapless users,
    even if the hardware is capable of handling a few more s/g entries.
    
    This bug caused NFS/RDMA mounts to fail when the server adapter used
    the mlx4 driver.
    
    Signed-off-by: Roland Dreier <rolandd at cisco.com>

diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index cec030e..a80df22 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -333,6 +333,9 @@ static int set_kernel_sq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap,
 		cap->max_inline_data + sizeof (struct mlx4_wqe_inline_seg)) +
 		send_wqe_overhead(type, qp->flags);
 
+	if (s > dev->dev->caps.max_sq_desc_sz)
+		return -EINVAL;
+
 	/*
 	 * Hermon supports shrinking WQEs, such that a single work
 	 * request can include multiple units of 1 << wqe_shift.  This
@@ -372,9 +375,6 @@ static int set_kernel_sq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap,
 		qp->sq.wqe_shift = ilog2(roundup_pow_of_two(s));
 
 	for (;;) {
-		if (1 << qp->sq.wqe_shift > dev->dev->caps.max_sq_desc_sz)
-			return -EINVAL;
-
 		qp->sq_max_wqes_per_wr = DIV_ROUND_UP(s, 1U << qp->sq.wqe_shift);
 
 		/*
@@ -395,7 +395,8 @@ static int set_kernel_sq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap,
 		++qp->sq.wqe_shift;
 	}
 
-	qp->sq.max_gs = ((qp->sq_max_wqes_per_wr << qp->sq.wqe_shift) -
+	qp->sq.max_gs = (min(dev->dev->caps.max_sq_desc_sz,
+			     (qp->sq_max_wqes_per_wr << qp->sq.wqe_shift)) -
 			 send_wqe_overhead(type, qp->flags)) /
 		sizeof (struct mlx4_wqe_data_seg);
 
@@ -411,7 +412,9 @@ static int set_kernel_sq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap,
 
 	cap->max_send_wr  = qp->sq.max_post =
 		(qp->sq.wqe_cnt - qp->sq_spare_wqes) / qp->sq_max_wqes_per_wr;
-	cap->max_send_sge = qp->sq.max_gs;
+	cap->max_send_sge = min(qp->sq.max_gs,
+				min(dev->dev->caps.max_sq_sg,
+				    dev->dev->caps.max_rq_sg));
 	/* We don't support inline sends for kernel QPs (yet) */
 	cap->max_inline_data = 0;