[openib-general] Re: Re: outstanding patches

Michael S. Tsirkin mst at mellanox.co.il
Wed Mar 8 08:37:45 PST 2006


Quoting r. Sean Hefty <mshefty at ichips.intel.com>:
> Subject: Re: Re: outstanding patches
> 
> Roland Dreier wrote:
> > > MAD:
> > > mad_port_close.patch
> >
> >BTW, I think I saw the crash that this fixes while testing the
> >previous patches.
> 
> I just haven't had time yet to create a patch for this.  The one that was 
> submitted ends up scanning all ports on every completion.  I need to spend 
> more time finding a way to handle this, and verify that we don't have a 
> larger issue that affects all kernel users.

OK, here's a patch to fix this race without a full port scan + a small cleanup.

---

Fix an oopsable race debugged by Eli Cohen <eli at mellanox.co.il>:
After removing the port from port_list, ib_mad_port_close flushes port_priv->wq
before destroying the special QPs. This means that a completion event could
arrive, and queue a new work in this work queue after flush.

This patch also removes an unnecessary flush_workqueue: destroy_workqueue
already includes a flush.

Signed-off-by: Michael S. Tsirkin <mst at mellanox.co.il>

Index: linux-2.6.15/drivers/infiniband/core/mad.c
===================================================================
--- linux-2.6.15.orig/drivers/infiniband/core/mad.c	2006-03-08 14:38:23.000000000 +0200
+++ linux-2.6.15/drivers/infiniband/core/mad.c	2006-03-08 21:13:08.000000000 +0200
@@ -2410,8 +2410,12 @@ static void timeout_sends(void *data)
 static void ib_mad_thread_completion_handler(struct ib_cq *cq, void *arg)
 {
 	struct ib_mad_port_private *port_priv = cq->cq_context;
+	unsigned long flags;
 
-	queue_work(port_priv->wq, &port_priv->work);
+	spin_lock_irqsave(&ib_mad_port_list_lock, flags);
+	if (!list_empty(&port_priv->port_list))
+		queue_work(port_priv->wq, &port_priv->work);
+	spin_unlock_irqrestore(&ib_mad_port_list_lock, flags);
 }
 
 /*
@@ -2723,18 +2727,23 @@ static int ib_mad_port_open(struct ib_de
 	}
 	INIT_WORK(&port_priv->work, ib_mad_completion_handler, port_priv);
 
+	spin_lock_irqsave(&ib_mad_port_list_lock, flags);
+	list_add_tail(&port_priv->port_list, &ib_mad_port_list);
+	spin_unlock_irqrestore(&ib_mad_port_list_lock, flags);
+
 	ret = ib_mad_port_start(port_priv);
 	if (ret) {
 		printk(KERN_ERR PFX "Couldn't start port\n");
 		goto error9;
 	}
 
-	spin_lock_irqsave(&ib_mad_port_list_lock, flags);
-	list_add_tail(&port_priv->port_list, &ib_mad_port_list);
-	spin_unlock_irqrestore(&ib_mad_port_list_lock, flags);
 	return 0;
 
 error9:
+	spin_lock_irqsave(&ib_mad_port_list_lock, flags);
+	list_del_init(&port_priv->port_list);
+	spin_unlock_irqrestore(&ib_mad_port_list_lock, flags);
+
 	destroy_workqueue(port_priv->wq);
 error8:
 	destroy_mad_qp(&port_priv->qp_info[1]);
@@ -2771,11 +2780,9 @@ static int ib_mad_port_close(struct ib_d
 		printk(KERN_ERR PFX "Port %d not found\n", port_num);
 		return -ENODEV;
 	}
-	list_del(&port_priv->port_list);
+	list_del_init(&port_priv->port_list);
 	spin_unlock_irqrestore(&ib_mad_port_list_lock, flags);
 
-	/* Stop processing completions. */
-	flush_workqueue(port_priv->wq);
 	destroy_workqueue(port_priv->wq);
 	destroy_mad_qp(&port_priv->qp_info[1]);
 	destroy_mad_qp(&port_priv->qp_info[0]);

-- 
Michael S. Tsirkin
Staff Engineer, Mellanox Technologies



More information about the general mailing list