[ofa-general] Re: multicast join failed for...
Hal Rosenstock
halr at voltaire.com
Thu Apr 12 06:26:21 PDT 2007
On Wed, 2007-04-11 at 23:38, Michael S. Tsirkin wrote:
> > Quoting Hal Rosenstock <halr at voltaire.com>:
> > Subject: Re: multicast join failed for...
> >
> > On Wed, 2007-04-11 at 15:47, Michael S. Tsirkin wrote:
> > > > Quoting Hal Rosenstock <halr at voltaire.com>:
> > > > Subject: Re: multicast join failed for...
> > > >
> > > > On Wed, 2007-04-11 at 14:12, Michael S. Tsirkin wrote:
> > > > > > > If yes, I'm actually not too happy with this.
> > > > > > >
> > > > > > > Would something like the following heuristic work better?
> > > > > > > - select the max rate between all participants
> > > > > >
> > > > > > The issue is that one doesn't know all the participants in a group as
> > > > > > they are joined dynamically.
> > > > > >
> > > > > > (I think we've been over this aspect on the list several times in the
> > > > > > past.)
> > > > >
> > > > > That's why I suggest the fix, so that the rate is adapted
> > > > > dynamically.
> > > > >
> > > > > > > - when a host with lower rate joins, destroy the group
> > > > > >
> > > > > > I don't think a group can be destroyed like this "underneath" its
> > > > > > existing members.
> > > > > >
> > > > >
> > > > > Of course it can. That's what happens when SM is restarted.
> > > >
> > > > Client reregistration ? I don't like using that big hammer as a solution
> > > > to this. Seems a little harsh to me.
> > >
> > > I think it's not too bad
> >
> > It requires all subscriptions to reregister. This affects more things
> > than just multicast or even the groups affected which might not be all
> > of the multicast groups. Hence BIG hammer.
>
> Changing an option in opensm config requires restarting
> opensm. Isn't that right?
Yes but that doesn't have to be the case going forward in terms of
OpenSM reconfig.
> So its an even bigger hammer.
Restarting opensm is a slightly bigger hammer right now (than client
reregistration) in the case the admin wants it "dynamic" but I suspect
this only needs to be done once.
> > There could be a more
> > graceful way to deal with this. I don't like using client reregister
> > unless absolutely needed.
>
> What are the other options that have the same funcitionality?
Perhaps a spec enhancement is possible to make this better.
> > > - previously we had some client failing join
> > > which is worse.
> >
> > Maybe not. Maybe that's what the admin wants (to keep the higher rate
> > rather than degrade the group due to some link issue).
>
> Rate could be an option, but I think generally people prefer
> things working even if at a slower rate.
I think it's a coin flip. I've seen it both ways and either way there
are support questions. In the current scenario, it is join failures. In
the proposed scenario, it is more subtle: performance implications and
perhaps SA network storms.
> Wat does opensm do now?
> I think it uses the max possible rate when group is created.
> Is that so?
It uses the rate as configured in the partitions file or the default
rate of 10 Gbs if not (for the per partition IPv4 broadcast group).
> > > And we can still keep an option to limit the rate
> > > manually.
> > >
> > > > I'm not convinced it's even
> > > > required either,
> > >
> > > How do you mean? All end-points must know the rate is now lower.
> >
> > I didn't think we had the complete story yet on what is going on.
>
> You are speaking about a specific instance then?
> OK, but I'm speaking generally, the issue comes up
> quite often.
OK, true and it is has been discussed on the list quite often.
-- Hal
More information about the general
mailing list