[ofa-general] RE: OpenSM detection of duplicated GUIDs on loopback

Eitan Zahavi eitan at mellanox.co.il
Fri Jul 27 04:27:53 PDT 2007


The problem I have with back-to-back plug is that it is a fatal case if
found in a case where there was no use of this plug.
So we will  need some sort of user input if it is OK or not.

The case of moving a port in the middle of a sweep can be easily
detected if instead of reporting an error a second 
check of the original DR where the same GUID was found is performed...

Eitan Zahavi
Senior Engineering Director, Software Architect
Mellanox Technologies LTD
Tel:+972-4-9097208
Fax:+972-4-9593245
P.O. Box 586 Yokneam 20692 ISRAEL

 

> -----Original Message-----
> From: Sasha Khapyorsky [mailto:sashak at voltaire.com] 
> Sent: Friday, July 27, 2007 4:07 AM
> To: Eitan Zahavi
> Cc: Hal Rosenstock; OpenFabrics General; Yevgeny Kliteynik
> Subject: Re: OpenSM detection of duplicated GUIDs on loopback
> 
> On 09:25 Thu 26 Jul     , Eitan Zahavi wrote:
> > > Hi Eitan, Hal,
> > > 
> > > On 20:44 Wed 25 Jul     , Eitan Zahavi wrote:
> > > > 
> > > > I am not following you.
> > > > Why do a user need to run -y if a simple legal cable 
> connector is 
> > > > plugged?
> > > 
> > > Because duplicated GUIDs detector can aborts OpenSM when regular 
> > > port is reconnected to another location during hard sweep. This 
> > > issue is not related to loopback plug at all.
> > I  think we should handle the case of "migrated port" in a 
> more global
> > sense:
> > If a port "moved" during the sweep we have to do a new sweep anyway.
> 
> Another option is just to use recently discovered port 
> location. In case of CA it could work, switch migration can 
> be more complicated.
> 
> > Maybe we could delay the 'abort' to the second sweep.
> >
> > So practically I propose:
> > 1. Add state flag "was duplicated" on the port saying it 
> was reported 
> > as duplicate GUID.
> > 2. Set the variable controlling a forced secodn sweep 
> (similar to the 
> > one used if we got Set error)
> 
> We even can catch this yet before drop_manager and just rediscover.
> 
> > 3. Repeat the sweep - if we find a port where it is a duplicate and 
> > the "was duplicated" flag is set - abort.
> >
> > A refinement for the user who is doing many changes 
> continuously might 
> > be to keep a counter.
> > And have the abort happen after the Nth iteration.
> 
> It is better approach than what we have today.
> 
> > > 
> > > > The issue is only if a "loop back" plug connecting a port
> > > to itself is
> > > > plugged.
> > > 
> > > No, not only. Now there are two completely separate known issues 
> > > with duplicated GUIDs detector:
> > > 
> > > 1. Port moving
> > > 2. Loopback plug
> > > 
> > > And I think that _both_ should be solved. And if just using '-y' 
> > > could be suitable for (2) because it is esoteric 
> (although perfectly 
> > > legal) use, it is not acceptable solution for (1).
> > > 
> > > I think we need to improve GUIDs duplication detector 
> instead. For 
> > > example we could add NodeInfo comparison there, and only 
> in case if 
> > > it is different drop GUIDs duplication error. Also I think this 
> > > should not be fatal error and should not abort OpenSM, 
> just logging 
> > > (probably via syslog too) should be sufficient - 
> non-working port is 
> > > good reason to look at logs. Another ideas?
> > The problem is that the SM will sort of figure out the network but 
> > will create a completely bogus routing etc.
> 
> Right. But it is not so with back-to-back (when loopback plug 
> could be interpreted as back-to-back duplicated GUID). So no 
> need to abort in this (back-to-back/loopback) case. Agreed?
> 
> Sasha
> 
> > 
> > > 
> > > Sasha
> > > 
> > > > Do users use these plugs? For what sake?
> > > > 
> > > > 
> > > > Eitan Zahavi
> > > > Senior Engineering Director, Software Architect Mellanox
> > > Technologies
> > > > LTD
> > > > Tel:+972-4-9097208
> > > > Fax:+972-4-9593245
> > > > P.O. Box 586 Yokneam 20692 ISRAEL
> > > > 
> > > >  
> > > > 
> > > > > -----Original Message-----
> > > > > From: Sasha Khapyorsky [mailto:sashak at voltaire.com]
> > > > > Sent: Wednesday, July 25, 2007 3:19 AM
> > > > > To: Eitan Zahavi
> > > > > Cc: Hal Rosenstock; OpenFabrics General; Yevgeny Kliteynik
> > > > > Subject: Re: OpenSM detection of duplicated GUIDs on loopback
> > > > > 
> > > > > On 23:25 Tue 24 Jul     , Eitan Zahavi wrote:
> > > > > > 
> > > > > > 	On 7/24/07, Eitan Zahavi <eitan at mellanox.co.il> wrote: 
> > > > > > 
> > > > > > 		Maybe  avoid the log if -y is provided?
> > > > > > 
> > > > > > 	 
> > > > > > 	That avoids the spew but the duplicated GUID is
> > > > > important to know so
> > > > > > IMO something in the "middle" is needed where
> > > duplicated GUIDs are
> > > > > > logged but not continually the same ones.
> > > > > > 	[EZ]  
> > > > > > 	OK so in -y mode only we track which ones were reported
> > > > > and do not
> > > > > > repeat the log?
> > > > > 
> > > > > And how port moving problem should be solved?
> > > > > 
> > > > > We cannot ask an user to run OpenSM with '-y' if in
> > > her/his plans to
> > > > > reconnect some ports in a future and just decrease logging.
> > > > > 
> > > > > Sasha
> > > > > 
> > > 
> 



More information about the general mailing list