[ofa-general] [PATCH] IB/core: handle race between elements in qork queues after event

Thu May 1 05:52:57 PDT 2008

I made some changes according  to Or and Roland's comments


This patch solves a race between work elements that are carried out after an
event occurs. When SM address handle becomes invalid and needs an update it is
handled by a work in the global workqueue. On the other hand this event is also
handled in ib_ipoib by queuing a work in the ipoib_workqueue that does mcast join.
Although queuing is in the right order, it is done to 2 different workqueues and so
there is no guarantee that the first to be queued is the first to be executed.

The patch sets the SM address handle  to NULL and until update_sm_ah() is called, 
any request that needs sm_ah is replied with -EAGAIN return status.

For consumers, the patch doesn't make things worse. Before the patch mads are sent to the
wrong SM and now they are now blocked before they are sent. Consumers can be improved if they 
examine the return code and respond to EAGAIN properly but even without an improvement the 
situation is not getting worse in in some cases it gets better.

Being specific in this issue yields
* Callers of ib_sa_mcmember_rec_query() seem to handle the error returns properly but without
checking specifically for EAGAIN.
* Callers of ib_sa_path_rec_get() handle error returns but not with a retry
* I didn't find any caller of ib_sa_service_rec_query()

Signed-off-by: Moni Levy  <monil at voltaire.com>
Signed-off-by: Moni Shoua <monis at voltaire.com>

---

 drivers/infiniband/core/sa_query.c |   26 ++++++++++++++++++++++----
 1 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c
index cf474ec..a2e61d7 100644
--- a/drivers/infiniband/core/sa_query.c
+++ b/drivers/infiniband/core/sa_query.c
@@ -413,9 +413,20 @@ static void ib_sa_event(struct ib_event_
 	    event->event == IB_EVENT_PKEY_CHANGE ||
 	    event->event == IB_EVENT_SM_CHANGE   ||
 	    event->event == IB_EVENT_CLIENT_REREGISTER) {
-		struct ib_sa_device *sa_dev;
-		sa_dev = container_of(handler, typeof(*sa_dev), event_handler);
-
+		unsigned long flags;
+		struct ib_sa_device *sa_dev =
+			container_of(handler, typeof(*sa_dev), event_handler);
+		struct ib_sa_port *port =
+			&sa_dev->port[event->element.port_num - sa_dev->start_port];
+		struct ib_sa_sm_ah *sm_ah;
+
+		spin_lock_irqsave(&port->ah_lock, flags);
+		sm_ah = port->sm_ah;
+		port->sm_ah = NULL;
+		spin_unlock_irqrestore(&port->ah_lock, flags);
+
+		if (sm_ah)
+			kref_put(&sm_ah->ref, free_sm_ah);
 		schedule_work(&sa_dev->port[event->element.port_num -
 					    sa_dev->start_port].update_task);
 	}
@@ -663,6 +674,8 @@ int ib_sa_path_rec_get(struct ib_sa_clie
 		return -ENODEV;
 
 	port  = &sa_dev->port[port_num - sa_dev->start_port];
+	if (!port->sm_ah)
+		return  -EAGAIN;
 	agent = port->agent;
 
 	query = kmalloc(sizeof *query, gfp_mask);
@@ -780,6 +793,9 @@ int ib_sa_service_rec_query(struct ib_sa
 		return -ENODEV;
 
 	port  = &sa_dev->port[port_num - sa_dev->start_port];
+	if (!port->sm_ah)
+		return  -EAGAIN;
+
 	agent = port->agent;
 
 	if (method != IB_MGMT_METHOD_GET &&
@@ -877,8 +893,10 @@ int ib_sa_mcmember_rec_query(struct ib_s
 		return -ENODEV;
 
 	port  = &sa_dev->port[port_num - sa_dev->start_port];
-	agent = port->agent;
+	if (!port->sm_ah)
+		return  -EAGAIN;
 
+	agent = port->agent;
 	query = kmalloc(sizeof *query, gfp_mask);
 	if (!query)
 		return -ENOMEM;