[ofa-general] Re: [PATCH for-2.6.22] ipoib/cm: initialize RX before moving QP to RTR
Michael S. Tsirkin
mst at dev.mellanox.co.il
Tue Jun 19 03:40:41 PDT 2007
Fix a crasher bug in IPoIB CM: once QP is in RTR, an RX completion (and even an
asynchronous error) might be observed on this QP, so we have to initialize all
RX fields beforehand.
As an optimization (since modify_qp might take a long time),
the jiffies update done when moving RX to the passive_ids list is also
left in place to reduce the chance of the RX being mis-detected as stale.
This fixes bug <https://bugs.openfabrics.org/show_bug.cgi?id=662>
Signed-off-by: Michael S. Tsirkin <mst at dev.mellanox.co.il>
---
Resending - Roland, is the changelog OK?
Please consider this bugfix for 2.6.22.
> > Quoting Woodruff, Robert J <robert.j.woodruff at intel.com>:
> > Subject: RE: [ofa-general] crash in ipoib
> >
> > Sean wrote,
> > >> And here's a version with error handling fixed.
> > >> Sean, does this solve your crash?
> >
> > >We've been running this patch since yesterday and haven't seen any
> > >crashes. We'll continue testing this over the week-end.
> >
> > >- Sean
> >
> > This looks like it fixed the panic.
> >
> > Should we try to put out a new RC with this latest ipoib fix ?
> > I really think we need it in the release. If we could get another RC out
> > today,
> > that would only delay the release by a couple of more days and we could
> > release on next Friday rather than wed. and still give people a week to
> > test the final RC.
> >
> > woody
>
> OK, the following patch has been added to OFED 1.2.
> Roland, please consider this bugfix for 2.6.22.
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index 076a0bb..c64249f 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -309,6 +309,11 @@ static int ipoib_cm_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *even
return -ENOMEM;
p->dev = dev;
p->id = cm_id;
+ cm_id->context = p;
+ p->state = IPOIB_CM_RX_LIVE;
+ p->jiffies = jiffies;
+ INIT_LIST_HEAD(&p->list);
+
p->qp = ipoib_cm_create_rx_qp(dev, p);
if (IS_ERR(p->qp)) {
ret = PTR_ERR(p->qp);
@@ -320,24 +325,24 @@ static int ipoib_cm_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *even
if (ret)
goto err_modify;
+ spin_lock_irq(&priv->lock);
+ queue_delayed_work(ipoib_workqueue,
+ &priv->cm.stale_task, IPOIB_CM_RX_DELAY);
+ /* Add this entry to passive ids list head, but do not re-add it
+ * if IB_EVENT_QP_LAST_WQE_REACHED has moved it to flush list. */
+ p->jiffies = jiffies;
+ if (p->state == IPOIB_CM_RX_LIVE)
+ list_move(&p->list, &priv->cm.passive_ids);
+ spin_unlock_irq(&priv->lock);
+
ret = ipoib_cm_send_rep(dev, cm_id, p->qp, &event->param.req_rcvd, psn);
if (ret) {
ipoib_warn(priv, "failed to send REP: %d\n", ret);
- goto err_rep;
+ if (ib_modify_qp(p->qp, &ipoib_cm_err_attr, IB_QP_STATE))
+ ipoib_warn(priv, "unable to move qp to error state\n");
}
-
- cm_id->context = p;
- p->jiffies = jiffies;
- p->state = IPOIB_CM_RX_LIVE;
- spin_lock_irq(&priv->lock);
- if (list_empty(&priv->cm.passive_ids))
- queue_delayed_work(ipoib_workqueue,
- &priv->cm.stale_task, IPOIB_CM_RX_DELAY);
- list_add(&p->list, &priv->cm.passive_ids);
- spin_unlock_irq(&priv->lock);
return 0;
-err_rep:
err_modify:
ib_destroy_qp(p->qp);
err_qp:
--
MST
More information about the general
mailing list