[ofa-general] RE: OpenSM detection of duplicated GUIDs on loopback
Eitan Zahavi
eitan at mellanox.co.il
Wed Jul 25 23:25:13 PDT 2007
> Hi Eitan, Hal,
>
> On 20:44 Wed 25 Jul , Eitan Zahavi wrote:
> >
> > I am not following you.
> > Why do a user need to run -y if a simple legal cable connector is
> > plugged?
>
> Because duplicated GUIDs detector can aborts OpenSM when
> regular port is reconnected to another location during hard
> sweep. This issue is not related to loopback plug at all.
I think we should handle the case of "migrated port" in a more global
sense:
If a port "moved" during the sweep we have to do a new sweep anyway.
Maybe we could delay the 'abort' to the second sweep.
So practically I propose:
1. Add state flag "was duplicated" on the port saying it was reported as
duplicate GUID.
2. Set the variable controlling a forced secodn sweep (similar to the
one used if we got Set error)
3. Repeat the sweep - if we find a port where it is a duplicate and the
"was duplicated" flag is set - abort.
A refinement for the user who is doing many changes continuously might
be to keep a counter.
And have the abort happen after the Nth iteration.
>
> > The issue is only if a "loop back" plug connecting a port
> to itself is
> > plugged.
>
> No, not only. Now there are two completely separate known
> issues with duplicated GUIDs detector:
>
> 1. Port moving
> 2. Loopback plug
>
> And I think that _both_ should be solved. And if just using
> '-y' could be suitable for (2) because it is esoteric
> (although perfectly legal) use, it is not acceptable solution for (1).
>
> I think we need to improve GUIDs duplication detector
> instead. For example we could add NodeInfo comparison there,
> and only in case if it is different drop GUIDs duplication
> error. Also I think this should not be fatal error and should
> not abort OpenSM, just logging (probably via syslog too)
> should be sufficient - non-working port is good reason to
> look at logs. Another ideas?
The problem is that the SM will sort of figure out the network but will
create a completely bogus routing etc.
>
> Sasha
>
> > Do users use these plugs? For what sake?
> >
> >
> > Eitan Zahavi
> > Senior Engineering Director, Software Architect Mellanox
> Technologies
> > LTD
> > Tel:+972-4-9097208
> > Fax:+972-4-9593245
> > P.O. Box 586 Yokneam 20692 ISRAEL
> >
> >
> >
> > > -----Original Message-----
> > > From: Sasha Khapyorsky [mailto:sashak at voltaire.com]
> > > Sent: Wednesday, July 25, 2007 3:19 AM
> > > To: Eitan Zahavi
> > > Cc: Hal Rosenstock; OpenFabrics General; Yevgeny Kliteynik
> > > Subject: Re: OpenSM detection of duplicated GUIDs on loopback
> > >
> > > On 23:25 Tue 24 Jul , Eitan Zahavi wrote:
> > > >
> > > > On 7/24/07, Eitan Zahavi <eitan at mellanox.co.il> wrote:
> > > >
> > > > Maybe avoid the log if -y is provided?
> > > >
> > > >
> > > > That avoids the spew but the duplicated GUID is
> > > important to know so
> > > > IMO something in the "middle" is needed where
> duplicated GUIDs are
> > > > logged but not continually the same ones.
> > > > [EZ]
> > > > OK so in -y mode only we track which ones were reported
> > > and do not
> > > > repeat the log?
> > >
> > > And how port moving problem should be solved?
> > >
> > > We cannot ask an user to run OpenSM with '-y' if in
> her/his plans to
> > > reconnect some ports in a future and just decrease logging.
> > >
> > > Sasha
> > >
>
More information about the general
mailing list