[ofa-general] Re: [PATCH 3/4] opensm: resweep instead of exit when duplicated guid suspected
Hal Rosenstock
hal.rosenstock at gmail.com
Tue Aug 14 08:24:38 PDT 2007
On 8/14/07, Sasha Khapyorsky <sashak at voltaire.com> wrote:
> On 10:44 Tue 14 Aug , Hal Rosenstock wrote:
> > On 8/14/07, Sasha Khapyorsky <sashak at voltaire.com> wrote:
> > > On 10:50 Mon 13 Aug , Hal Rosenstock wrote:
> > > > On 8/13/07, Sasha Khapyorsky <sashak at voltaire.com> wrote:
> > > > > Hi Hal,
> > > > >
> > > > > On 09:19 Mon 13 Aug , Hal Rosenstock wrote:
> > > > > > On 8/12/07, Sasha Khapyorsky <sashak at voltaire.com> wrote:
> > > > > > > Anyway OpenSM will request resweep when there are suspected nodes
> > > > > > > with duplicate GUID on the subnet. And because we cannot be 100% sure
> > > > > > > that detected GUIDs duplication is not some corner case of port moving
> > > > > > > I prefer to not exit. Endless (re)discovery and syslog messages should
> > > > > > > be good indication if it is indeed this case.
> > > > > >
> > > > > > Couldn't there be some duplication state kept per GUID so the messages
> > > > > > only get logged on change of state to duplicated rather than
> > > > > > continually spewing into the log ?
> > > > >
> > > > > There should be one message per duplicated GUID in the sweep. The sweep
> > > > > will be repeated and in the case of real duplication the message will
> > > > > appear again - so it is per sweep. I hope it is not too much.
> > > >
> > > > Once per sweep is too much IMO. It still fills the log over time.
> > >
> > > Hmm, I cannot find how to limit those printing in an elegant way.
> > > When there is real GUID duplication it is fatal error and setup must be
> > > fixed, so it is not something which could let us to work normally. Also I
> > > guess the case itself is pretty esoteric one. Do you think it is
> > > critical?
> >
> > Critical no but important since when it does occur, it fills the log
> > with these repeated messages obscuring the important ones.
>
> When it does occur this will be _most_ important message. There were
> the changes in the code, now if "duplicated GUID" is reported repeatedly
> it is _not_ "false alarm". OpenSM will not work in such conditions.
OK; that is different from before when there were false alarms and
there were some scenarios where it would continue to "work".
-- Hal
>
> Sasha
>
More information about the general
mailing list