[ofa-general] RE: opensm: a bug in heavy sweep? - no LFT re-configuration
Eitan Zahavi
eitan at mellanox.co.il
Mon Jul 23 00:35:25 PDT 2007
Hi Sasha,
> On 14:59 Sun 22 Jul , Eitan Zahavi wrote:
> > Hi Sasha
> >
> > Let's assume someone has reset a switch on the fabric.
> > What would cause the SM to re-assign the LFT of that switch?
>
> OpenSM will sweep and drop this switch and when switch will
> back it will be initialized again. But if the reset was too
> fast (relative to discovery), we can be in trouble (and maybe
> not only with LFTs).
>
> > I assumed that there is a mechanism to do that.
>
> Not for "fast" switch reboot.
So we have a problem with these fast resetting devices.
>
> Hmm, I think we could try to detect this case by comparing
> SwitchInfo:LinerFDBTop with current p_sw->max_lid_ho or even
> by seeing that PortInfo:LID is not set. Something like below:
>
I think we should have a predicate that will be used to mark a
port/device as needing a full update.
Not just LFT but everything (SL2VL, VLArb, LID, PKey ... If a device was
reset then it probably lost everything).
Another approach is to mark it for the entire fabric.
The original intention of kill -HUP was to force a new heavy sweep and
setup.
I this another signal is acceptible but not required.
Thanks
Eitan
More information about the general
mailing list