[ofa-general] [BUG report / PATCH] fix race in the core multicast management

Hal Rosenstock hrosenstock at xsigo.com
Tue Sep 25 06:20:39 PDT 2007


On Tue, 2007-09-25 at 15:00 +0200, Or Gerlitz wrote:
> Sean Hefty wrote:
> >>> node 1 <-> switch A <-> switch B <-> switch C <-> SA
> 
> >> The host would only see port up/down events as of changes in the link
> >> state in the local port or in the port which is connected to it through
> >> the cable.
> 
> > So, if you brought the link down/up between switches A & B, node 1
> > wouldn't receive any events, but it would be removed from the multicast
> > group?
> 
> good catch!
> 
> Indeed, when the link between switches A and B goes down, per the view 
> point of the SM, the whole sub-fabric across A is lost and hence the 
> node is dropped from all the multicast groups it is joined to.

No, it is not (dropped from all multicast groups it is joined to). It
may be removed from the multicast forwarding tables if there is no route
available but it is still a member of the group.

> However, from the view point of the node, no port down is experienced.
> 
> When the A-B link goes up, the SM discovers all nodes across A and 
> probes their ports, though this process a port active event --might-- be 
> generated by the HCA FW, but I am not sure its mandatory.
> 
> Since the only trigger for ipoib to rejoin to multicast groups is 
> delivery of event by the hw driver, namely one of: port down/up, lid 
> change, sm lid change, client re-register. I think we might have a hole 
> here if none of these events is generated.

It doesn't need to rejoin for this case. See above explanation.

-- Hal

> Please note that through this discovery, at least one mad is sent from 
> the SM to the node. If we enforce the SM to set the re-register bit 
> --each-- time it discovers a node, then the bug is solved.
> 
> I will test this scheme and let you know what I get (with the voltaire 
> SM and mthca driver).
> 
> Eitan, Michael - any insight on the matter?
> 
> Or.
> 
> 
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



More information about the general mailing list