[ofa-general] Re: OpenSM detection of duplicated GUIDs on loopback

Sasha Khapyorsky sashak at voltaire.com
Thu Jul 26 18:07:07 PDT 2007


On 09:25 Thu 26 Jul     , Eitan Zahavi wrote:
> > Hi Eitan, Hal,
> > 
> > On 20:44 Wed 25 Jul     , Eitan Zahavi wrote:
> > > 
> > > I am not following you.
> > > Why do a user need to run -y if a simple legal cable connector is 
> > > plugged?
> > 
> > Because duplicated GUIDs detector can aborts OpenSM when 
> > regular port is reconnected to another location during hard 
> > sweep. This issue is not related to loopback plug at all.
> I  think we should handle the case of "migrated port" in a more global
> sense:
> If a port "moved" during the sweep we have to do a new sweep anyway.

Another option is just to use recently discovered port location. In
case of CA it could work, switch migration can be more complicated.

> Maybe we could delay the 'abort' to the second sweep.
>
> So practically I propose:
> 1. Add state flag "was duplicated" on the port saying it was reported as
> duplicate GUID.
> 2. Set the variable controlling a forced secodn sweep (similar to the
> one used if we got Set error)

We even can catch this yet before drop_manager and just rediscover.

> 3. Repeat the sweep - if we find a port where it is a duplicate and the
> "was duplicated" flag is set - abort.
>
> A refinement for the user who is doing many changes continuously might
> be to keep a counter.
> And have the abort happen after the Nth iteration.

It is better approach than what we have today.

> > 
> > > The issue is only if a "loop back" plug connecting a port 
> > to itself is 
> > > plugged.
> > 
> > No, not only. Now there are two completely separate known 
> > issues with duplicated GUIDs detector:
> > 
> > 1. Port moving
> > 2. Loopback plug
> > 
> > And I think that _both_ should be solved. And if just using 
> > '-y' could be suitable for (2) because it is esoteric 
> > (although perfectly legal) use, it is not acceptable solution for (1).
> > 
> > I think we need to improve GUIDs duplication detector 
> > instead. For example we could add NodeInfo comparison there, 
> > and only in case if it is different drop GUIDs duplication 
> > error. Also I think this should not be fatal error and should 
> > not abort OpenSM, just logging (probably via syslog too) 
> > should be sufficient - non-working port is good reason to 
> > look at logs. Another ideas?
> The problem is that the SM will sort of figure out the network but will
> create a completely bogus routing etc.

Right. But it is not so with back-to-back (when loopback plug could be
interpreted as back-to-back duplicated GUID). So no need to abort in
this (back-to-back/loopback) case. Agreed?

Sasha

> 
> > 
> > Sasha
> > 
> > > Do users use these plugs? For what sake?
> > > 
> > > 
> > > Eitan Zahavi
> > > Senior Engineering Director, Software Architect Mellanox 
> > Technologies 
> > > LTD
> > > Tel:+972-4-9097208
> > > Fax:+972-4-9593245
> > > P.O. Box 586 Yokneam 20692 ISRAEL
> > > 
> > >  
> > > 
> > > > -----Original Message-----
> > > > From: Sasha Khapyorsky [mailto:sashak at voltaire.com]
> > > > Sent: Wednesday, July 25, 2007 3:19 AM
> > > > To: Eitan Zahavi
> > > > Cc: Hal Rosenstock; OpenFabrics General; Yevgeny Kliteynik
> > > > Subject: Re: OpenSM detection of duplicated GUIDs on loopback
> > > > 
> > > > On 23:25 Tue 24 Jul     , Eitan Zahavi wrote:
> > > > > 
> > > > > 	On 7/24/07, Eitan Zahavi <eitan at mellanox.co.il> wrote: 
> > > > > 
> > > > > 		Maybe  avoid the log if -y is provided?
> > > > > 
> > > > > 	 
> > > > > 	That avoids the spew but the duplicated GUID is
> > > > important to know so
> > > > > IMO something in the "middle" is needed where 
> > duplicated GUIDs are 
> > > > > logged but not continually the same ones.
> > > > > 	[EZ]  
> > > > > 	OK so in -y mode only we track which ones were reported
> > > > and do not
> > > > > repeat the log?
> > > > 
> > > > And how port moving problem should be solved?
> > > > 
> > > > We cannot ask an user to run OpenSM with '-y' if in 
> > her/his plans to 
> > > > reconnect some ports in a future and just decrease logging.
> > > > 
> > > > Sasha
> > > > 
> > 



More information about the general mailing list