[openib-general] [PATCH fixed] ipoib: device removal races
Michael S. Tsirkin
mst at mellanox.co.il
Mon Aug 15 02:05:43 PDT 2005
Quoting r. Michael S. Tsirkin <mst at mellanox.co.il>:
> Subject: Re: [PATCH] ipoib: device removal races
>
> Quoting r. Roland Dreier <rolandd at cisco.com>:
> > Subject: Re: [openib-general] [PATCH] ipoib: device removal races
> >
> > Michael> As a side note, schedule_work in ipoib_event also looks
> > Michael> suspicios. Cant we have it oustanding when the device is
> > Michael> going down? Roland, what do you say we switch that to
> > Michael> ipoib_workqueue as well, and add a flush after
> > Michael> ib_unregister_event_handler?
> >
> > Thanks, I'll take a look at all the workqueue stuff in IPoIB.
> >
> > - R.
> >
>
> Here's fix for this theoretical race (I didnt see it triggered in real life).
> This needs to be applied in addition to my previous patch, which
> fixes a crash I actually see in the lab.
>
> Roland, I think at least the previous one-line patch should go in to
> 2.6.13. Do you have it?
>
And here's a patch that actually works. Sorry.
Roland, pls note this patch is *in addition* to the ipoib_set_mcast_list
patch.
---
It seems we can have a work oustanding when the device is
going down. Solve this by creating a work queue for events.
We cant reuse the ipoib_workqueue since that sometimes needs to be
flushed when we get an event.
Its also probably a good idea to flush in a single threaded workqueue,
to prevent several flushes from running in parallel on multiple CPUs.
Signed-off-by: Michael S. Tsirkin <mst at mellanox.co.il>
Index: linux-kernel/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
===================================================================
--- linux-kernel/drivers/infiniband/ulp/ipoib/ipoib_verbs.c (revision 3084)
+++ linux-kernel/drivers/infiniband/ulp/ipoib/ipoib_verbs.c (working copy)
@@ -256,6 +256,6 @@ void ipoib_event(struct ib_event_handler
record->event == IB_EVENT_LID_CHANGE ||
record->event == IB_EVENT_SM_CHANGE) {
ipoib_dbg(priv, "Port active event\n");
- schedule_work(&priv->flush_task);
+ queue_work(ipoib_event_workqueue, &priv->flush_task);
}
}
Index: linux-kernel/drivers/infiniband/ulp/ipoib/ipoib_main.c
===================================================================
--- linux-kernel/drivers/infiniband/ulp/ipoib/ipoib_main.c (revision 3084)
+++ linux-kernel/drivers/infiniband/ulp/ipoib/ipoib_main.c (working copy)
@@ -65,6 +65,7 @@ static const u8 ipv4_bcast_addr[] = {
};
struct workqueue_struct *ipoib_workqueue;
+struct workqueue_struct *ipoib_event_workqueue;
static void ipoib_add_one(struct ib_device *device);
static void ipoib_remove_one(struct ib_device *device);
@@ -993,6 +994,7 @@ debug_failed:
register_failed:
ib_unregister_event_handler(&priv->event_handler);
+ flush_workqueue(ipoib_event_workqueue);
event_failed:
ipoib_dev_cleanup(priv->dev);
@@ -1045,6 +1047,7 @@ static void ipoib_remove_one(struct ib_d
list_for_each_entry_safe(priv, tmp, dev_list, list) {
ib_unregister_event_handler(&priv->event_handler);
+ flush_workqueue(ipoib_event_workqueue);
unregister_netdev(priv->dev);
ipoib_dev_cleanup(priv->dev);
@@ -1061,8 +1064,8 @@ static int __init ipoib_init_module(void
return ret;
/*
- * We create our own workqueue mainly because we want to be
- * able to flush it when devices are being removed. We can't
+ * We create our own workqueues mainly because we want to be
+ * able to flush them when devices are being removed. We can't
* use schedule_work()/flush_scheduled_work() because both
* unregister_netdev() and linkwatch_event take the rtnl lock,
* so flush_scheduled_work() can deadlock during device
@@ -1074,12 +1077,21 @@ static int __init ipoib_init_module(void
goto err_fs;
}
+ ipoib_event_workqueue = create_singlethread_workqueue("ipoibevent");
+ if (!ipoib_event_workqueue) {
+ ret = -ENOMEM;
+ goto err_wq;
+ }
+
ret = ib_register_client(&ipoib_client);
if (ret)
- goto err_wq;
+ goto err_fwq;
return 0;
+err_fwq:
+ destroy_workqueue(ipoib_event_workqueue);
+
err_wq:
destroy_workqueue(ipoib_workqueue);
@@ -1092,8 +1104,9 @@ err_fs:
static void __exit ipoib_cleanup_module(void)
{
ib_unregister_client(&ipoib_client);
- ipoib_unregister_debugfs();
+ destroy_workqueue(ipoib_event_workqueue);
destroy_workqueue(ipoib_workqueue);
+ ipoib_unregister_debugfs();
}
module_init(ipoib_init_module);
Index: linux-kernel/drivers/infiniband/ulp/ipoib/ipoib.h
===================================================================
--- linux-kernel/drivers/infiniband/ulp/ipoib/ipoib.h (revision 3084)
+++ linux-kernel/drivers/infiniband/ulp/ipoib/ipoib.h (working copy)
@@ -217,6 +217,7 @@ static inline struct ipoib_neigh **to_ip
}
extern struct workqueue_struct *ipoib_workqueue;
+extern struct workqueue_struct *ipoib_event_workqueue;
/* functions */
--
MST
More information about the general
mailing list