[ofa-general] Re: [PATCH 3/4] opensm: resweep instead of exit when duplicated guid suspected

Sasha Khapyorsky sashak at voltaire.com
Tue Aug 14 08:19:51 PDT 2007


On 10:44 Tue 14 Aug     , Hal Rosenstock wrote:
> On 8/14/07, Sasha Khapyorsky <sashak at voltaire.com> wrote:
> > On 10:50 Mon 13 Aug     , Hal Rosenstock wrote:
> > > On 8/13/07, Sasha Khapyorsky <sashak at voltaire.com> wrote:
> > > > Hi Hal,
> > > >
> > > > On 09:19 Mon 13 Aug     , Hal Rosenstock wrote:
> > > > > On 8/12/07, Sasha Khapyorsky <sashak at voltaire.com> wrote:
> > > > > > Anyway OpenSM will request resweep when there are suspected nodes
> > > > > > with duplicate GUID on the subnet. And because we cannot be 100% sure
> > > > > > that detected GUIDs duplication is not some corner case of port moving
> > > > > > I prefer to not exit. Endless (re)discovery and syslog messages should
> > > > > > be good indication if it is indeed this case.
> > > > >
> > > > > Couldn't there be some duplication state kept per GUID so the messages
> > > > > only get logged on change of state to duplicated rather than
> > > > > continually spewing into the log ?
> > > >
> > > > There should be one message per duplicated GUID in the sweep. The sweep
> > > > will be repeated and in the case of real duplication the message will
> > > > appear again - so it is per sweep. I hope it is not too much.
> > >
> > > Once per sweep is too much IMO. It still fills the log over time.
> >
> > Hmm, I cannot find how to limit those printing in an elegant way.
> > When there is real GUID duplication it is fatal error and setup must be
> > fixed, so it is not something which could let us to work normally. Also I
> > guess the case itself is pretty esoteric one. Do you think it is
> > critical?
> 
> Critical no but important since when it does occur, it fills the log
> with these repeated messages obscuring the important ones.

When it does occur this will be _most_ important message. There were
the changes in the code, now if "duplicated GUID" is reported repeatedly
it is _not_ "false alarm". OpenSM will not work in such conditions.

Sasha



More information about the general mailing list