[ofa-general] RE: opensm: a bug in heavy sweep? - no LFT re-configuration

Eitan Zahavi eitan at mellanox.co.il
Mon Jul 23 00:35:25 PDT 2007


Hi Sasha,

> On 14:59 Sun 22 Jul     , Eitan Zahavi wrote:
> > Hi Sasha
> > 
> > Let's assume someone has reset a switch on the fabric.
> > What would cause the SM to re-assign the LFT of that switch?
> 
> OpenSM will sweep and drop this switch and when switch will 
> back it will be initialized again. But if the reset was too 
> fast (relative to discovery), we can be in trouble (and maybe 
> not only with LFTs).
> 
> > I assumed that there is a mechanism to do that.
> 
> Not for "fast" switch reboot.
So we have a problem with these fast resetting devices.
> 
> Hmm, I think we could try to detect this case by comparing 
> SwitchInfo:LinerFDBTop with current p_sw->max_lid_ho or even 
> by seeing that PortInfo:LID is not set. Something like below:
> 
I think we should have a predicate that will be used to mark a
port/device as needing a full update.
Not just LFT but everything (SL2VL, VLArb, LID, PKey ... If a device was
reset then it probably lost everything). 
Another approach is to mark it for the entire fabric. 

The original intention of kill -HUP was to force a new heavy sweep and
setup.
I this another signal is acceptible but not required.

Thanks

Eitan



More information about the general mailing list