[ofa-general] crash in ipoib

Michael S. Tsirkin mst at dev.mellanox.co.il
Thu Jun 14 11:44:45 PDT 2007


> Quoting Sean Hefty <mshefty at ichips.intel.com>:
> Subject: Re: [ofa-general] crash in ipoib
> 
> >Where does :ib_ipoib:ipoib_cm_handle_rx_wc+378 point to on your system?
> 
> It points to list_move below:
> 
> 	if (!likely(wr_id & IPOIB_CM_RX_UPDATE_MASK)) {
> 		p = wc->qp->qp_context;
> 		if (p && time_after_eq(jiffies, p->jiffies +
> 					IPOIB_CM_RX_UPDATE_TIME)) {
> 			spin_lock_irqsave(&priv->lock, flags);
> 			p->jiffies = jiffies;
> 			/* Move this entry to list head, but do not
> 			   re-add it
> 			 * if it has been moved out of list. */
> 			if (p->state == IPOIB_CM_RX_LIVE)
> >>>				list_move(&p->list,
> 					  priv->cm.passive_ids);
> 			spin_unlock_irqrestore(&priv->lock, flags);
> 		}
> 	}
> 
> There appears to be a race in ipoib_cm_req_handler() setting the 
> ipoib_cm_rx state outside of a lock, and before the item it added to a 
> list.  I think this could cause list_move() call above to oops.

Hmm,yes, looks like you are right.

> I think 
> ipoib_cm_req_handler() needs changes, but I'm not sure if this is enough 
> (patch below has line wrap issues...):
> 
> @@ -291,16 +291,16 @@ static int ipoib_cm_req_handler(struct ib_cm_id 
> *cm_id, st
>         if (ret)
>                 goto err_modify;
> 
> +       cm_id->context = p;
>         ret = ipoib_cm_send_rep(dev, cm_id, p->qp,
> 				&event->param.req_rcvd, psn);
>         if (ret) {
>                 ipoib_warn(priv, "failed to send REP: %d\n", ret);
>                 goto err_rep;
>         }
> 
> -       cm_id->context = p;
>         p->jiffies = jiffies;
> -       p->state = IPOIB_CM_RX_LIVE;
>         spin_lock_irq(&priv->lock);
> +       p->state = IPOIB_CM_RX_LIVE;
>         if (list_empty(&priv->cm.passive_ids))
>                 queue_delayed_work(ipoib_workqueue,
>                                    &priv->cm.stale_task,
> 				   IPOIB_CM_RX_DELAY);

I'm not sure this is enough. Maybe the following is needed?
Can you test it?

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index 076a0bb..2509bb8 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -320,12 +320,6 @@ static int ipoib_cm_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *even
 	if (ret)
 		goto err_modify;
 
-	ret = ipoib_cm_send_rep(dev, cm_id, p->qp, &event->param.req_rcvd, psn);
-	if (ret) {
-		ipoib_warn(priv, "failed to send REP: %d\n", ret);
-		goto err_rep;
-	}
-
 	cm_id->context = p;
 	p->jiffies = jiffies;
 	p->state = IPOIB_CM_RX_LIVE;
@@ -335,6 +329,13 @@ static int ipoib_cm_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *even
 				   &priv->cm.stale_task, IPOIB_CM_RX_DELAY);
 	list_add(&p->list, &priv->cm.passive_ids);
 	spin_unlock_irq(&priv->lock);
+
+	ret = ipoib_cm_send_rep(dev, cm_id, p->qp, &event->param.req_rcvd, psn);
+	if (ret) {
+		/* TODO: error handling is wrong here */
+		ipoib_warn(priv, "failed to send REP: %d\n", ret);
+		goto err_rep;
+	}
 	return 0;
 
 err_rep:

-- 
MST



More information about the general mailing list