[ofa-general] RE: opensm: a bug in heavy sweep? - no LFT	re-configuration
    Eitan Zahavi 
    eitan at mellanox.co.il
       
    Mon Jul 23 00:35:25 PDT 2007
    
    
  
Hi Sasha,
> On 14:59 Sun 22 Jul     , Eitan Zahavi wrote:
> > Hi Sasha
> > 
> > Let's assume someone has reset a switch on the fabric.
> > What would cause the SM to re-assign the LFT of that switch?
> 
> OpenSM will sweep and drop this switch and when switch will 
> back it will be initialized again. But if the reset was too 
> fast (relative to discovery), we can be in trouble (and maybe 
> not only with LFTs).
> 
> > I assumed that there is a mechanism to do that.
> 
> Not for "fast" switch reboot.
So we have a problem with these fast resetting devices.
> 
> Hmm, I think we could try to detect this case by comparing 
> SwitchInfo:LinerFDBTop with current p_sw->max_lid_ho or even 
> by seeing that PortInfo:LID is not set. Something like below:
> 
I think we should have a predicate that will be used to mark a
port/device as needing a full update.
Not just LFT but everything (SL2VL, VLArb, LID, PKey ... If a device was
reset then it probably lost everything). 
Another approach is to mark it for the entire fabric. 
The original intention of kill -HUP was to force a new heavy sweep and
setup.
I this another signal is acceptible but not required.
Thanks
Eitan
    
    
More information about the general
mailing list