[openib-general] [PATCH] mthca: memfree completion with error workaround

Michael S. Tsirkin mst at mellanox.co.il
Mon Jun 12 05:16:35 PDT 2006


Roland, please consider the following for 2.6.17.

---

Memfree firmware is in rare cases reporting WQE index == -1
in receive completion with error instead of (rq size - 1).
Here is a patch to avoid kernel crash and report a correct WR id in this case.

Since reporting a wrong WR id has severe consequences for ULPs,
make the test as restrictive as possible, and report an error
if we see an unexpected value.

Signed-off-by: Michael S. Tsirkin <mst at mellanox.co.il>

Index: openib/drivers/infiniband/hw/mthca/mthca_cq.c
===================================================================
--- openib/drivers/infiniband/hw/mthca/mthca_cq.c	(revision 7837)
+++ openib/drivers/infiniband/hw/mthca/mthca_cq.c	(working copy)
@@ -542,6 +542,22 @@
 	} else {
 		wq = &(*cur_qp)->rq;
 		wqe_index = be32_to_cpu(cqe->wqe) >> wq->wqe_shift;
+		/* WQE index == -1 might be reported by
+		   Sinai FW 1.0.800, Arbel FW 5.1.400 and should be fixed
+		   in later revisions. */
+		if (unlikely(wqe_index >= (*cur_qp)->rq.max)) {
+			if (unlikely(is_error) &&
+			    unlikely(wqe_index == 0xffffffff >> wq->wqe_shift) &&
+			    mthca_is_memfree(dev))
+				wqe_index = wq->max - 1;
+			else {
+				mthca_err(dev, "Corrupted RQ CQE. "
+					  "CQ 0x%x QP 0x%x idx 0x%x > 0x%x\n",
+					  cq->cqn, entry->qp_num, wqe_index,
+					  wq->max);
+				return -EINVAL;
+			}
+		}
 		entry->wr_id = (*cur_qp)->wrid[wqe_index];
 	}
 

-- 
MST




More information about the general mailing list