[ewg] [PATCH for-2.6.22] ipoib/cm: initialize RX before moving QP to RTR

Michael S. Tsirkin mst at dev.mellanox.co.il
Mon Jun 18 01:32:40 PDT 2007


Fix a crasher bug in IPoIB CM: once QP is in RTR, an RX completion (and even an
asynchronous error) might be observed on this QP, so we have to initialize all
RX fields beforehand.

This fixes bug <https://bugs.openfabrics.org/show_bug.cgi?id=662>

Signed-off-by: Michael S. Tsirkin <mst at dev.mellanox.co.il>

---

> Quoting Woodruff, Robert J <robert.j.woodruff at intel.com>:
> Subject: RE: [ofa-general] crash in ipoib
> 
> Sean wrote,
> >> And here's a version with error handling fixed.
> >> Sean, does this solve your crash?
> 
> >We've been running this patch since yesterday and haven't seen any 
> >crashes.  We'll continue testing this over the week-end.
> 
> >- Sean
> 
> This looks like it fixed the panic. 
> 
> Should we try to put out a new RC with this latest ipoib fix ?
> I really think we need it in the release. If we could get another RC out
> today,
> that would only delay the release by a couple of more days and we could
> release on next Friday rather than wed. and still give people a week to 
> test the final RC.
> 
> woody

OK, the following patch has been added to OFED 1.2.
Roland, please consider this bugfix for 2.6.22.

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index 076a0bb..c64249f 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -309,6 +309,11 @@ static int ipoib_cm_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *even
 		return -ENOMEM;
 	p->dev = dev;
 	p->id = cm_id;
+	cm_id->context = p;
+	p->state = IPOIB_CM_RX_LIVE;
+	p->jiffies = jiffies;
+	INIT_LIST_HEAD(&p->list);
+
 	p->qp = ipoib_cm_create_rx_qp(dev, p);
 	if (IS_ERR(p->qp)) {
 		ret = PTR_ERR(p->qp);
@@ -320,24 +325,24 @@ static int ipoib_cm_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *even
 	if (ret)
 		goto err_modify;
 
+	spin_lock_irq(&priv->lock);
+	queue_delayed_work(ipoib_workqueue,
+			   &priv->cm.stale_task, IPOIB_CM_RX_DELAY);
+	/* Add this entry to passive ids list head, but do not re-add it
+	 * if IB_EVENT_QP_LAST_WQE_REACHED has moved it to flush list. */
+	p->jiffies = jiffies;
+	if (p->state == IPOIB_CM_RX_LIVE)
+		list_move(&p->list, &priv->cm.passive_ids);
+	spin_unlock_irq(&priv->lock);
+
 	ret = ipoib_cm_send_rep(dev, cm_id, p->qp, &event->param.req_rcvd, psn);
 	if (ret) {
 		ipoib_warn(priv, "failed to send REP: %d\n", ret);
-		goto err_rep;
+		if (ib_modify_qp(p->qp, &ipoib_cm_err_attr, IB_QP_STATE))
+			ipoib_warn(priv, "unable to move qp to error state\n");
 	}
-
-	cm_id->context = p;
-	p->jiffies = jiffies;
-	p->state = IPOIB_CM_RX_LIVE;
-	spin_lock_irq(&priv->lock);
-	if (list_empty(&priv->cm.passive_ids))
-		queue_delayed_work(ipoib_workqueue,
-				   &priv->cm.stale_task, IPOIB_CM_RX_DELAY);
-	list_add(&p->list, &priv->cm.passive_ids);
-	spin_unlock_irq(&priv->lock);
 	return 0;
 
-err_rep:
 err_modify:
 	ib_destroy_qp(p->qp);
 err_qp:

-- 
MST



More information about the ewg mailing list