[openib-general] Re: Failed multicast join withnew multicast module

Hal Rosenstock halr at voltaire.com
Wed Jun 7 11:50:16 PDT 2006


On Wed, 2006-06-07 at 14:25, Sean Hefty wrote:
> Sean Hefty wrote:
> > The multicast module should work in this specific case, since the only 
> > client is ipoib, and ipoib first leaves the group before re-joining.
> 
> I think that there's a race here.  If ipoib leaves, then re-joins quickly 
> enough, the join request will be processed before the leave.

The order of joins and leaves is important in terms of the SA.

>   The result is that 
> the join will be fulfilled locally, without an additional MAD sent.  (Trying to 
> process the leave immediately doesn't fix the problem in the generic case, where 
> there may be multiple users of a group.)
> 
> A temporary fix would be to always send a MAD, even if the join can be fulfilled 
> locally.  But I'm looking at having the multicast module re-join on an event. 
> This raises the possibility that the new join request may fail, which would 
> require the multicast module to report that a membership is no longer active.

A similar (as yet unresolved) problem exists with the SA if the topology
changes and the previous group/members can no longer be satisfied.

> Another problem is if some nodes are joined as NonMembers or SendOnlyNonMembers, 
> then the SM will not create the multicast group when they try to re-join.

The same is true for FullMembers when there is insufficient components
to create the group. In these cases, the group must either be precreated
or the creator of the group must talk to the SA "first".

>   This 
> leads to a race where NonMembers and SendOnlyNonMembers will fail to re-join 
> until one of the FullMembers joins.

Might also be true with joins (not creates) from FullMembers too. I
would presume in such cases, the join would be retried. SendOnlyMembers
(at least for IPoIB) do this if not joined every time a packet is sent.

-- Hal

> - Sean





More information about the general mailing list