[ofa-general] Re: opensm: a bug in heavy sweep? - no LFT re-configuration

Sasha Khapyorsky sashak at voltaire.com
Mon Jul 23 05:59:28 PDT 2007


Hi Eitan,

On 10:35 Mon 23 Jul     , Eitan Zahavi wrote:
> Hi Sasha,
> 
> > On 14:59 Sun 22 Jul     , Eitan Zahavi wrote:
> > > Hi Sasha
> > > 
> > > Let's assume someone has reset a switch on the fabric.
> > > What would cause the SM to re-assign the LFT of that switch?
> > 
> > OpenSM will sweep and drop this switch and when switch will 
> > back it will be initialized again. But if the reset was too 
> > fast (relative to discovery), we can be in trouble (and maybe 
> > not only with LFTs).
> > 
> > > I assumed that there is a mechanism to do that.
> > 
> > Not for "fast" switch reboot.
> So we have a problem with these fast resetting devices.
> > 
> > Hmm, I think we could try to detect this case by comparing 
> > SwitchInfo:LinerFDBTop with current p_sw->max_lid_ho or even 
> > by seeing that PortInfo:LID is not set. Something like below:
> > 
> I think we should have a predicate that will be used to mark a
> port/device as needing a full update.

Agreed, but what is the best criteria? LID == 0 will work in many cases,
but LID initialization is not required by spec. The only strong
requirement I found is Port State. Another ideas?

> Not just LFT but everything (SL2VL, VLArb, LID, PKey ... If a device was
> reset then it probably lost everything). 

Right, all incrementally updated data should be flushed - osm_physp
and osm_switch are affected objects.

> Another approach is to mark it for the entire fabric. 

It is too expensive IMO, and not much easier to implement.

Sasha

> 
> The original intention of kill -HUP was to force a new heavy sweep and
> setup.
> I this another signal is acceptible but not required.
> 
> Thanks
> 
> Eitan



More information about the general mailing list