[ofa-general] [BUG report / PATCH] fix race in the core multicast management

Thu Sep 20 09:29:57 PDT 2007

> I see, however, there is no second join here, its a leave and join where 
> the group refcount climbs to 2 since the the join code inc it on its 
> synchronous part which is executed before the thread handles the 
> processing of the leave request.

The refcount is only used to ensure that the group structure continues 
to exist.  The code must be able to handle multiple users calling 
join/free at the same time, including a single user calling free before 
its previous call to join has completed.  All MADs sent for the same 
multicast group must also be serialized to prevent join and leave 
requests for the same group from reaching the SA out of order.

If you walk through ib_sa_free_multicast(), the group membership is 
decremented.  A reference is held on the group because a work item has 
just been queued on the group for processing.  We cannot remove this 
reference unless we avoid queuing the work item.  And the work item is 
queued to ensure that the leave request to the SA is serialized with 
possible future join requests.

> I am not sure this is what we want from the core design. Say the 
> consumer has some flexibility in the join request (eg through future api 
> change), such that they can join a group, leave it, then join again this 
> group with different "attributes". Then if the join crosses the leave in 
> a way that causes the core code not to issue sa leave/join queries, its 
> a bug from the perspective of the user.

If the attributes from a subsequent join differ from an existing join, 
the subsequent join operation will fail.  The only way I can think of to 
make this situation work is to add an asynchronous 
ib_sa_leave_multicast() routine that provides a callback after the leave 
completes, in addition to the existing free call.

This could be a fairly difficult case to make work anyway, since it 
requires destroying the group at the SA before it can be re-created with 
the different attributes.  It requires coordination across the group 
that's beyond the control of the local multicast module.  (A single 
group creator could handle this fairly easily.)

> OK, on this specific host system there was no port down event! so the 
> only event that the multicast and ipoib code got was port active. This 
> is why the patch I sent solves (hides) the problem, it causes the 
> multicast code to transition the group into the error state, so the 
> ipoib join that follows causes an sa join query to be actually sent.

There were two port active events delivered back to back with no other 
events in between?  If so, is this something that can or should occur? 
The patch itself looks fine to me; I'm trying to determine if there are 
other refcount problems in the multicast module.  I'm not convinced that 
there are at this point.

> I don't think there's a problem in ipoib, it just does not rely on 
> multicast error notifications but rather on port events. Do you think 
> its less robust, and if yes, why?

As long as the multicast module gets the event notification first, which 
I believe is the case, then I don't think there's any problems.

- Sean