[ofa-general] [BUG report / PATCH] fix race in the core multicast management
Or Gerlitz
ogerlitz at voltaire.com
Sun Sep 23 01:40:30 PDT 2007
Sean Hefty wrote:
> I've read back over this description a few times, and I still don't
> fully grok the problem. Can you clarify if the following sequence is
> what's happening?
> 1. The node has joined the multicast group. Meaning that the SA has
> routed multicast traffic to the node.
> 2. You take down the link of the switch port that connects the node. Is
> this done via a program?
> 3. The port is brought back online. This generates a PORT_ACTIVE event,
> but the previous event was also PORT_ACTIVE.
> 4. ipoib leaves the group.
> 5. ipoib re-joins the group.
> 6. The multicast module isn't aware that any errors have occurred on the
> multicast group, so simply completes the join request at step 5 without
> SA involvement.
> If I'm understanding this, somewhere in the above sequence the multicast
> routing to this node is lost. Either the SA removed the node from the
> group, or the switch lost its routing tables, or ...?
Indeed am taking the switch link down via a program.
Now, is this case there was --no-- previous event, when the port was
brought back online there was PORT_ACTIVE event (its a driver issue
which we look at). However, from the view point of the SA there was "GID
out" event, so the HCA port was dropped out from the multicast group and
the multicast routing (spanning tree, MFTs configuration etc) was
computed without this port being included. This is the ipoib logging of
what happens from its perspective (I have added the event number to the
"port state change event" print):
> ib0: Port state change event 9
> ib0: Flushing ib0
> ib0: flushing
> ib0: downing ib_dev
> ib0: stopping multicast thread
> ib0: flushing multicast list
> ib0: leaving MGID ff12:401b:ffff:0000:0000:0000:0000:0001
> ib0: deleting multicast group ff12:401b:ffff:0000:0000:0000:0000:0001
> ib0: leaving MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
> ib0: deleting multicast group ff12:401b:ffff:0000:0000:0000:ffff:ffff
> ib0: starting multicast thread
> ib0: restarting multicast task
> ib0: stopping multicast thread
> ib0: adding multicast entry for mgid ff12:401b:ffff:0000:0000:0000:0000:0001
> ib0: starting multicast thread
> ib0: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
> ib0: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status 0)
> ib0: Created ah c504b7a0
> ib0: MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff AV c504b7a0, LID 0xc000, SL 0
> ib0: joining MGID ff12:401b:ffff:0000:0000:0000:0000:0001
> ib0: join completion for ff12:401b:ffff:0000:0000:0000:0000:0001 (status 0)
> ib0: Created ah c504ba20
> ib0: MGID ff12:401b:ffff:0000:0000:0000:0000:0001 AV c504ba20, LID 0xc001, SL 0
> ib0: successfully joined all multicast groups
> I'm also trying to understand how the problem would apply to a different
> setup:
>
> node 1 <-> switch A <-> switch B <-> switch C <-> SA
>
> Suppose the same link down/up occurred between switch A and switch B.
> What happens to the multicast members to the left of switch B? Will
> node 1 see a PORT_ACTIVE event in this case as well?
The members of multicast group are only HCA ports. Indeed, join/leave
requests of members cause the SA to trigger the SM to recompute the
multicast routing, however, there are more causes, such as a port going
down anywhere in the fabric, so if its an hca port it would be dropped
from all the group it is member in, and if its a switch port, all the
effected unicast AND multicast routing must be computed by the SM.
The host would only see port up/down events as of changes in the link
state in the local port or in the port which is connected to it through
the cable.
Or.
Or.
More information about the general
mailing list