[ofa-general] RE: opensm: a bug in heavy sweep? - no LFT re-configuration

Eitan Zahavi eitan at mellanox.co.il
Sun Jul 22 04:59:23 PDT 2007


Hi Sasha

Let's assume someone has reset a switch on the fabric.
What would cause the SM to re-assign the LFT of that switch?
I assumed that there is a mechanism to do that.

Anyway, kill -HUP should flush out the state and restart from scratch.


Eitan

> -----Original Message-----
> From: Sasha Khapyorsky [mailto:sashak at voltaire.com] 
> Sent: Sunday, July 22, 2007 1:22 PM
> To: Eitan Zahavi
> Cc: OPENIB; hal.rosenstock at gmail.com; Yevgeny Kliteynik
> Subject: Re: opensm: a bug in heavy sweep? - no LFT re-configuration
> 
> Hi Eitan,
> 
> On 09:36 Sun 22 Jul     , Eitan Zahavi wrote:
> > Hi Sasha
> > 
> > I am running some tests manually and apparently it looks 
> like I found 
> > a bug. Here is the sequence of things:
> > 1. SM sweeps the fabric assign LFTs
> > 2. I manually modify some LFTs (single entry now marked 
> UNREACHABLE 3. 
> > I force some switch change bit to 1 or issue kill -HUP 4. The SM 
> > reports SUBNET UP 5. The modified LFT entry is still 
> UNREACHABLE and 
> > the path is broken
> 
> Right, in most cases (unless OpenSM has its own changes in 
> the same LFT
> block) OpenSM will refer its own LFT image for  "need to update"
> decision, so _manual_ changes will not trigger new update. 
> Rerunning OpenSM should help however.
> 
> > It looks to me some optimization of routing does not fully reroute 
> > unless some condition is met - but that condition does not 
> include the 
> > above triggers listed in step 3.
> 
> Rereading all fabrics LFTs by default seems to be too 
> expensive operations. At least by default, if it is real 
> requirement this could be enforced manually, for example when 
> kill -HUP is used. Thoughts?
> 
> Sasha
> 



More information about the general mailing list