[ofa-general] Re: opensm: a bug in heavy sweep? - no LFT re-configuration
Sasha Khapyorsky
sashak at voltaire.com
Mon Jul 23 05:59:28 PDT 2007
Hi Eitan,
On 10:35 Mon 23 Jul , Eitan Zahavi wrote:
> Hi Sasha,
>
> > On 14:59 Sun 22 Jul , Eitan Zahavi wrote:
> > > Hi Sasha
> > >
> > > Let's assume someone has reset a switch on the fabric.
> > > What would cause the SM to re-assign the LFT of that switch?
> >
> > OpenSM will sweep and drop this switch and when switch will
> > back it will be initialized again. But if the reset was too
> > fast (relative to discovery), we can be in trouble (and maybe
> > not only with LFTs).
> >
> > > I assumed that there is a mechanism to do that.
> >
> > Not for "fast" switch reboot.
> So we have a problem with these fast resetting devices.
> >
> > Hmm, I think we could try to detect this case by comparing
> > SwitchInfo:LinerFDBTop with current p_sw->max_lid_ho or even
> > by seeing that PortInfo:LID is not set. Something like below:
> >
> I think we should have a predicate that will be used to mark a
> port/device as needing a full update.
Agreed, but what is the best criteria? LID == 0 will work in many cases,
but LID initialization is not required by spec. The only strong
requirement I found is Port State. Another ideas?
> Not just LFT but everything (SL2VL, VLArb, LID, PKey ... If a device was
> reset then it probably lost everything).
Right, all incrementally updated data should be flushed - osm_physp
and osm_switch are affected objects.
> Another approach is to mark it for the entire fabric.
It is too expensive IMO, and not much easier to implement.
Sasha
>
> The original intention of kill -HUP was to force a new heavy sweep and
> setup.
> I this another signal is acceptible but not required.
>
> Thanks
>
> Eitan
More information about the general
mailing list