[openib-general] Re: Failed multicast join withnew multicast module

Fri Jun 9 03:43:13 PDT 2006

On Thu, 2006-06-08 at 18:00, Sean Hefty wrote:
> Hal Rosenstock wrote:
> > 2. There is lazy deletion of MC groups allowed so the reclamation may be
> > difficult.
> 
> I'm not familiar with the switch programming.

Note the MGRPs are MGIDs and switches are programmed with MLIDs and
these can be 1:1 or many:1 depending on the implementation. Most do not
do the many:1 but this is allowed by the spec. Also, note that switches
know nothing about the groups themselves (only MLIDs and which ports) so
most of the information is in the SM.

> Does the SM set the entire 
> MulticastForwardingTable for a switch every time a new group is created, or a 
> new member joins?

No. It only needs to program the affected block(s) of the MFT based on
the MLID and the portmask (ports for replication).

> If the SM loses track of all multicast groups, how are the 
> stale groups on the switches deleted?

There are different strategies for dealing with this. It could clear out
all the MFTs in all the switches but that is expensive. It could also
wait for multicast registrations and then program the needed MFT blocks
in the affected switches only caring about those. In this case, packets
on those MLIDs would still be forwarded until the MLID is reclaimed.

> > The endport SMAs are claiming they do support client reregistration but
> > it does take more than that for the endport/node to behave properly.
> 
> My original plan was to have the ib_multicast module rejoin all groups, but 
> since the MLIDs can change I can't see any way to handle reregistration safely 
> without involving the application.

Because the application needs to modify the QP for this ? As I said, I'm
not sure IPoIB was handling this before. I'm sure Roland knows for sure.

> My latest changes are just to report errors 
> on existing multicast groups on an affected port.

How ?

> > I know it is a conceptual rather than actual compliance. One issue would
> > be defining what it means to repect all existing communication. Then we
> > would need to look at whether that was feasible or not and perhaps
> > rescope what it means to a set of things achievable. Another issue would
> > be defining where it is possible or not. If that is totally vendor
> > dependent, then this would have no substance to it. It is largely a
> > matter of being a "better" SM.
> 
> We could use the phrase, "except where such communication is no longer 
> realizable" instead of "where possible".  Where unrealizable means impossible 
> because the communication uses properties that are physically impossible to 
> achieve given the hardware configuration of the subnet.  (See bottom of page 910 
> of the spec.)

That specific text is defined there for the case of unrealizable joins
which is very different from the case being discussed. The specific
property mismatches are listed. Still not sure what determines this in
the case we are discussing. 

> If an SM could just query switches for their MulticastForwardingTables or the 
> end nodes,

It can.

>  would we be able to avoid these issues?

How ? Not all the group information is in the switches.

-- Hal

> - Sean