[ofa-general] mthca max_sge value... ugh.

Fri May 16 16:12:51 PDT 2008

> OK, I see a problem with mlx4 -- it may spuriously return failure when
 > you try to create a QP with max_send_sge == 32, but only for kernel
 > QPs.  Which is why my userspace test didn't catch it.

The problem is this code in set_kernel_sq_size:

	if (dev->dev->caps.fw_ver >= MLX4_FW_VER_WQE_CTRL_NEC &&
	    qp->sq_signal_bits && BITS_PER_LONG == 64 &&
	    type != IB_QPT_SMI && type != IB_QPT_GSI)
		qp->sq.wqe_shift = ilog2(64);
	else
		qp->sq.wqe_shift = ilog2(roundup_pow_of_two(s));

	for (;;) {
		if (1 << qp->sq.wqe_shift > dev->dev->caps.max_sq_desc_sz)
			return -EINVAL;

if we can't use the "WQE shrinking" feature (because of selective
signaling in the NFS/RDMA case), and we want to use 32 sge entries, then
the WQE size 's' will end up a little more than 512 bytes, and the
wqe_shift will end up as 10.  But since the max_sq_desc_sz is 1008, we
return -EINVAL, when it is really fine to have a wqe_shift of 10 as long
as we don't use more than 1008 bytes per descriptor (I think).

So something like this is probably the fix (it suffices to make NFS/RDMA
mount work with ConnectX on both sides):

diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index cec030e..b6612a0 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -372,7 +372,8 @@ static int set_kernel_sq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap,
 		qp->sq.wqe_shift = ilog2(roundup_pow_of_two(s));
 
 	for (;;) {
-		if (1 << qp->sq.wqe_shift > dev->dev->caps.max_sq_desc_sz)
+		if (qp->sq.wqe_shift >
+		    ilog2(roundup_pow_of_two(dev->dev->caps.max_sq_desc_sz)))
 			return -EINVAL;
 
 		qp->sq_max_wqes_per_wr = DIV_ROUND_UP(s, 1U << qp->sq.wqe_shift);
@@ -395,7 +396,8 @@ static int set_kernel_sq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap,
 		++qp->sq.wqe_shift;
 	}
 
-	qp->sq.max_gs = ((qp->sq_max_wqes_per_wr << qp->sq.wqe_shift) -
+	qp->sq.max_gs = (min(dev->dev->caps.max_sq_desc_sz,
+			     (qp->sq_max_wqes_per_wr << qp->sq.wqe_shift)) -
 			 send_wqe_overhead(type, qp->flags)) /
 		sizeof (struct mlx4_wqe_data_seg);