[ofa-general] crash in ipoib
Michael S. Tsirkin
mst at dev.mellanox.co.il
Thu Jun 14 11:44:45 PDT 2007
> Quoting Sean Hefty <mshefty at ichips.intel.com>:
> Subject: Re: [ofa-general] crash in ipoib
>
> >Where does :ib_ipoib:ipoib_cm_handle_rx_wc+378 point to on your system?
>
> It points to list_move below:
>
> if (!likely(wr_id & IPOIB_CM_RX_UPDATE_MASK)) {
> p = wc->qp->qp_context;
> if (p && time_after_eq(jiffies, p->jiffies +
> IPOIB_CM_RX_UPDATE_TIME)) {
> spin_lock_irqsave(&priv->lock, flags);
> p->jiffies = jiffies;
> /* Move this entry to list head, but do not
> re-add it
> * if it has been moved out of list. */
> if (p->state == IPOIB_CM_RX_LIVE)
> >>> list_move(&p->list,
> priv->cm.passive_ids);
> spin_unlock_irqrestore(&priv->lock, flags);
> }
> }
>
> There appears to be a race in ipoib_cm_req_handler() setting the
> ipoib_cm_rx state outside of a lock, and before the item it added to a
> list. I think this could cause list_move() call above to oops.
Hmm,yes, looks like you are right.
> I think
> ipoib_cm_req_handler() needs changes, but I'm not sure if this is enough
> (patch below has line wrap issues...):
>
> @@ -291,16 +291,16 @@ static int ipoib_cm_req_handler(struct ib_cm_id
> *cm_id, st
> if (ret)
> goto err_modify;
>
> + cm_id->context = p;
> ret = ipoib_cm_send_rep(dev, cm_id, p->qp,
> &event->param.req_rcvd, psn);
> if (ret) {
> ipoib_warn(priv, "failed to send REP: %d\n", ret);
> goto err_rep;
> }
>
> - cm_id->context = p;
> p->jiffies = jiffies;
> - p->state = IPOIB_CM_RX_LIVE;
> spin_lock_irq(&priv->lock);
> + p->state = IPOIB_CM_RX_LIVE;
> if (list_empty(&priv->cm.passive_ids))
> queue_delayed_work(ipoib_workqueue,
> &priv->cm.stale_task,
> IPOIB_CM_RX_DELAY);
I'm not sure this is enough. Maybe the following is needed?
Can you test it?
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index 076a0bb..2509bb8 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -320,12 +320,6 @@ static int ipoib_cm_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *even
if (ret)
goto err_modify;
- ret = ipoib_cm_send_rep(dev, cm_id, p->qp, &event->param.req_rcvd, psn);
- if (ret) {
- ipoib_warn(priv, "failed to send REP: %d\n", ret);
- goto err_rep;
- }
-
cm_id->context = p;
p->jiffies = jiffies;
p->state = IPOIB_CM_RX_LIVE;
@@ -335,6 +329,13 @@ static int ipoib_cm_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *even
&priv->cm.stale_task, IPOIB_CM_RX_DELAY);
list_add(&p->list, &priv->cm.passive_ids);
spin_unlock_irq(&priv->lock);
+
+ ret = ipoib_cm_send_rep(dev, cm_id, p->qp, &event->param.req_rcvd, psn);
+ if (ret) {
+ /* TODO: error handling is wrong here */
+ ipoib_warn(priv, "failed to send REP: %d\n", ret);
+ goto err_rep;
+ }
return 0;
err_rep:
--
MST
More information about the general
mailing list