[ofa-general] RE: opensm: a bug in heavy sweep? - no LFT re-configuration
Eitan Zahavi
eitan at mellanox.co.il
Sun Jul 22 04:59:23 PDT 2007
Hi Sasha
Let's assume someone has reset a switch on the fabric.
What would cause the SM to re-assign the LFT of that switch?
I assumed that there is a mechanism to do that.
Anyway, kill -HUP should flush out the state and restart from scratch.
Eitan
> -----Original Message-----
> From: Sasha Khapyorsky [mailto:sashak at voltaire.com]
> Sent: Sunday, July 22, 2007 1:22 PM
> To: Eitan Zahavi
> Cc: OPENIB; hal.rosenstock at gmail.com; Yevgeny Kliteynik
> Subject: Re: opensm: a bug in heavy sweep? - no LFT re-configuration
>
> Hi Eitan,
>
> On 09:36 Sun 22 Jul , Eitan Zahavi wrote:
> > Hi Sasha
> >
> > I am running some tests manually and apparently it looks
> like I found
> > a bug. Here is the sequence of things:
> > 1. SM sweeps the fabric assign LFTs
> > 2. I manually modify some LFTs (single entry now marked
> UNREACHABLE 3.
> > I force some switch change bit to 1 or issue kill -HUP 4. The SM
> > reports SUBNET UP 5. The modified LFT entry is still
> UNREACHABLE and
> > the path is broken
>
> Right, in most cases (unless OpenSM has its own changes in
> the same LFT
> block) OpenSM will refer its own LFT image for "need to update"
> decision, so _manual_ changes will not trigger new update.
> Rerunning OpenSM should help however.
>
> > It looks to me some optimization of routing does not fully reroute
> > unless some condition is met - but that condition does not
> include the
> > above triggers listed in step 3.
>
> Rereading all fabrics LFTs by default seems to be too
> expensive operations. At least by default, if it is real
> requirement this could be enforced manually, for example when
> kill -HUP is used. Thoughts?
>
> Sasha
>
More information about the general
mailing list