[openib-general] [PATCH repost] IB/srp: re-create QP and CQ on reconnect

Michael S. Tsirkin mst at mellanox.co.il
Wed Oct 4 05:03:43 PDT 2006


From: Ishai Rabinovitz <ishai at mellanox.co.il>

Make srp destroy/re-create QP and CQ on each reconnect.
This makes SRP more robust in presence of hardware errors
and is closer to behaviour suggested by IB spec,
reducing chance of stale packets.

Signed-off-by: Ishai Rabinovitz <ishai at mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst at mellanox.co.il>

---

Roland, this has been posted a while ago, and still applies to for-2.6.19 with
a small offset. Looks like a good idea - could this go into 2.6.19?
A description from the original mail below:

For some reason (could be a firmware problem) I got a CQ overrun in SRP.
Because of that there was a QP FATAL. Since in srp_reconnect_target we are not
destroying the QP, the QP FATAL persists after the reconnect.
In order to be able to recover from such situation I suggest we
destroy the CQ and the QP in every reconnect.

This also corrects a minor spec in-compliance - when srp_reconnect_target
is called, srp destroys the CM ID and resets the QP, the new connection
will be retried with the same QPN which could theoretically lead to
stale packets (for strict spec compliance I think QPN should not be reused
till all stale packets are flushed out of the network).

Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c
===================================================================
--- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c	2006-08-31 12:23:52.000000000 +0300
+++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c	2006-08-31 12:30:48.000000000 +0300
@@ -495,10 +495,10 @@
 static int srp_reconnect_target(struct srp_target_port *target)
 {
 	struct ib_cm_id *new_cm_id;
-	struct ib_qp_attr qp_attr;
 	struct srp_request *req, *tmp;
-	struct ib_wc wc;
 	int ret;
+	struct ib_cq *old_cq;
+	struct ib_qp *old_qp;
 
 	spin_lock_irq(target->scsi_host->host_lock);
 	if (target->state != SRP_TARGET_LIVE) {
@@ -522,17 +522,17 @@
 	ib_destroy_cm_id(target->cm_id);
 	target->cm_id = new_cm_id;
 
-	qp_attr.qp_state = IB_QPS_RESET;
-	ret = ib_modify_qp(target->qp, &qp_attr, IB_QP_STATE);
-	if (ret)
-		goto err;
-
-	ret = srp_init_qp(target, target->qp);
-	if (ret)
+	old_qp = target->qp;
+	old_cq = target->cq;
+	ret = srp_create_target_ib(target);
+	if (ret) {
+		target->qp = old_qp;
+		target->cq = old_cq;
 		goto err;
+	}
 
-	while (ib_poll_cq(target->cq, 1, &wc) > 0)
-		; /* nothing */
+	ib_destroy_qp(old_qp);
+	ib_destroy_cq(old_cq);
 
 	spin_lock_irq(target->scsi_host->host_lock);
 	list_for_each_entry_safe(req, tmp, &target->req_queue, list)

-- 
MST




More information about the general mailing list