[openib-general] Re: Failed multicast join withnew multicast module

Hal Rosenstock halr at voltaire.com
Wed Jun 7 17:55:27 PDT 2006


On Wed, 2006-06-07 at 20:19, Sean Hefty wrote:
> Hal Rosenstock wrote:
> >>  This 
> >>leads to a race where NonMembers and SendOnlyNonMembers will fail to re-join 
> >>until one of the FullMembers joins.
> > 
> > Might also be true with joins (not creates) from FullMembers too. I
> > would presume in such cases, the join would be retried. SendOnlyMembers
> > (at least for IPoIB) do this if not joined every time a packet is sent.
> 
> Correct.  But all clients trying to rejoin groups must be aware of this, and 
> delay / retry until their groups are recreated.

I might be missing your point but UD is unreliable so the sends can be
dropped. The delay/retry is to make sure the join does occur,

> Let me know if I'm off here, but it also appears that clients can't rely on an 
> existing QP attachment or address handle to send to the new group.  Even if a 
> group is re-created, there's no guarantee that the SA didn't assign a different 
> MLID to the group.

Correct. I have seen this behavior with various dynamic groups.

I know there was code in IPoIB to handle local LID changes (adjusting
the AH). I'm not sure about whether multicast changes were handled too
but I don't recall this.

> So, the only safe thing to do is for all multicast clients to detach from all 
> multicast groups, destroy all address handles,

Why all groups ?

> possibly wait for a new group to be created, and then start all over again.

Start what all over again ?

>   Is this correct?

I'm not completely following you yet.

-- Hal

> - Sean





More information about the general mailing list