[openib-general] oops at device removal

Michael S. Tsirkin mst at mellanox.co.il
Sun Jan 28 11:33:03 PST 2007


> Quoting Michael S. Tsirkin <mst at mellanox.co.il>:
> Subject: Re: oops at device removal
> 
> > We have observed the following crash:
> 
> OK, I think I see a reason for this.
> 
> I notice the following in code, file multicast.c, function mcast_add_one:
> 
>         ib_set_client_data(device, &mcast_client, dev);
> 
> 	        INIT_IB_EVENT_HANDLER(&event_handler, device,
> 				      mcast_event_handler);
>         ib_register_event_handler(&event_handler);
> 
> So it seems like if I have 2 devices, &event_handler will be registered twice.
> This will trigger data corruption as same entry will be added to list twice.
> 
> Or so it seems. Sean, what's the idea here?

It seems something like the following would fix it (untested).

------------------------------------------------

Make new multicast code not crash on platforms with multiple HCAs.

Signed-off-by: Michael S. Tsirkin <mst at mellanox.co.il>

---

diff --git a/drivers/infiniband/core/multicast.c b/drivers/infiniband/core/multicast.c
index fde977e..e51a078 100644
--- a/drivers/infiniband/core/multicast.c
+++ b/drivers/infiniband/core/multicast.c
@@ -51,7 +51,6 @@ static struct ib_client mcast_client = {
 };
 
 static struct ib_sa_client	sa_client;
-static struct ib_event_handler	event_handler;
 static struct workqueue_struct	*mcast_wq;
 static union ib_gid mgid0;
 
@@ -71,6 +70,7 @@ struct mcast_device {
 	int			start_port;
 	int			end_port;
 	struct mcast_port	port[0];
+	struct ib_event_handler	event_handler;
 };
 
 enum mcast_state {
@@ -793,8 +793,8 @@ static void mcast_add_one(struct ib_device *device)
 	dev->device = device;
 	ib_set_client_data(device, &mcast_client, dev);
 
-	INIT_IB_EVENT_HANDLER(&event_handler, device, mcast_event_handler);
-	ib_register_event_handler(&event_handler);
+	INIT_IB_EVENT_HANDLER(&dev->event_handler, device, mcast_event_handler);
+	ib_register_event_handler(&dev->event_handler);
 }
 
 static void mcast_remove_one(struct ib_device *device)
@@ -807,7 +807,7 @@ static void mcast_remove_one(struct ib_device *device)
 	if (!dev)
 		return;
 
-	ib_unregister_event_handler(&event_handler);
+	ib_unregister_event_handler(&dev->event_handler);
 	flush_workqueue(mcast_wq);
 
 	for (i = 0; i <= dev->end_port - dev->start_port; i++) {

-- 
MST




More information about the general mailing list