[ofa-general] [BUG report / PATCH] fix race in the core multicast management

Sun Sep 23 01:40:30 PDT 2007

Sean Hefty wrote:
> I've read back over this description a few times, and I still don't 
> fully grok the problem.  Can you clarify if the following sequence is 
> what's happening?

> 1. The node has joined the multicast group.  Meaning that the SA has 
> routed multicast traffic to the node.
> 2. You take down the link of the switch port that connects the node.  Is 
> this done via a program?
> 3. The port is brought back online.  This generates a PORT_ACTIVE event, 
> but the previous event was also PORT_ACTIVE.
> 4. ipoib leaves the group.
> 5. ipoib re-joins the group.
> 6. The multicast module isn't aware that any errors have occurred on the 
> multicast group, so simply completes the join request at step 5 without 
> SA involvement.

> If I'm understanding this, somewhere in the above sequence the multicast 
> routing to this node is lost.  Either the SA removed the node from the 
> group, or the switch lost its routing tables, or ...?

Indeed am taking the switch link down via a program.

Now, is this case there was --no-- previous event, when the port was 
brought back online there was PORT_ACTIVE event (its a driver issue 
which we look at). However, from the view point of the SA there was "GID 
out" event, so the HCA port was dropped out from the multicast group and 
the multicast routing (spanning tree, MFTs configuration etc) was 
computed without this port being included. This is the ipoib logging of 
what happens from its perspective (I have added the event number to the 
"port state change event" print):

> ib0: Port state change event 9
> ib0: Flushing ib0
> ib0: flushing
> ib0: downing ib_dev
> ib0: stopping multicast thread
> ib0: flushing multicast list
> ib0: leaving MGID ff12:401b:ffff:0000:0000:0000:0000:0001
> ib0: deleting multicast group ff12:401b:ffff:0000:0000:0000:0000:0001
> ib0: leaving MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
> ib0: deleting multicast group ff12:401b:ffff:0000:0000:0000:ffff:ffff
> ib0: starting multicast thread
> ib0: restarting multicast task
> ib0: stopping multicast thread
> ib0: adding multicast entry for mgid ff12:401b:ffff:0000:0000:0000:0000:0001
> ib0: starting multicast thread
> ib0: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
> ib0: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status 0)
> ib0: Created ah c504b7a0
> ib0: MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff AV c504b7a0, LID 0xc000, SL 0
> ib0: joining MGID ff12:401b:ffff:0000:0000:0000:0000:0001
> ib0: join completion for ff12:401b:ffff:0000:0000:0000:0000:0001 (status 0)
> ib0: Created ah c504ba20
> ib0: MGID ff12:401b:ffff:0000:0000:0000:0000:0001 AV c504ba20, LID 0xc001, SL 0
> ib0: successfully joined all multicast groups

> I'm also trying to understand how the problem would apply to a different 
> setup:
> 
> node 1 <-> switch A <-> switch B <-> switch C <-> SA
> 
> Suppose the same link down/up occurred between switch A and switch B. 
> What happens to the multicast members to the left of switch B?  Will 
> node 1 see a PORT_ACTIVE event in this case as well?

The members of multicast group are only HCA ports. Indeed, join/leave 
requests of members cause the SA to trigger the SM to recompute the 
multicast routing, however, there are more causes, such as a port going 
down anywhere in the fabric, so if its an hca port it would be dropped 
from all the group it is member in, and if its a switch port, all the 
effected unicast AND multicast routing must be computed by the SM.

The host would only see port up/down events as of changes in the link 
state in the local port or in the port which is connected to it through 
the cable.

Or.