[ofa-general] [BUG report / PATCH] fix race in the core multicast management
Or Gerlitz
ogerlitz at voltaire.com
Tue Sep 25 06:25:19 PDT 2007
Hal Rosenstock wrote:
> On Tue, 2007-09-25 at 15:00 +0200, Or Gerlitz wrote:
>> Sean Hefty wrote:
>>>>> node 1 <-> switch A <-> switch B <-> switch C <-> SA
>>>> The host would only see port up/down events as of changes in the link
>>>> state in the local port or in the port which is connected to it through
>>>> the cable.
>>> So, if you brought the link down/up between switches A & B, node 1
>>> wouldn't receive any events, but it would be removed from the multicast
>>> group?
>> good catch!
>>
>> Indeed, when the link between switches A and B goes down, per the view
>> point of the SM, the whole sub-fabric across A is lost and hence the
>> node is dropped from all the multicast groups it is joined to.
>
> No, it is not (dropped from all multicast groups it is joined to). It
> may be removed from the multicast forwarding tables if there is no route
> available but it is still a member of the group.
Hi Hal,
So the node (port) is a member of a multicast group for which routing is
not configured but when the port is discovered again, the SM runs the
multicast routing engine (for all groups? for all groups for which
discovered ports are member of?) again and configures the routing, nice.
I will test it, thanks for the explanation.
Or.
>> However, from the view point of the node, no port down is experienced.
>>
>> When the A-B link goes up, the SM discovers all nodes across A and
>> probes their ports, though this process a port active event --might-- be
>> generated by the HCA FW, but I am not sure its mandatory.
>>
>> Since the only trigger for ipoib to rejoin to multicast groups is
>> delivery of event by the hw driver, namely one of: port down/up, lid
>> change, sm lid change, client re-register. I think we might have a hole
>> here if none of these events is generated.
>
> It doesn't need to rejoin for this case. See above explanation.
More information about the general
mailing list