[ofa-general] RE: OpenSM detection of duplicated GUIDs on loopback

Eitan Zahavi eitan at mellanox.co.il
Tue Jul 24 07:52:33 PDT 2007


From: Hal Rosenstock [mailto:hal.rosenstock at gmail.com] 
Sent: Tuesday, July 24, 2007 5:53 PM
To: Eitan Zahavi
Cc: OpenFabrics General; Sasha Khapyorsky; Yevgeny Kliteynik
Subject: Re: OpenSM detection of duplicated GUIDs on loopback



	Hi Eitan,
	
	
	On 7/24/07, Eitan Zahavi <eitan at mellanox.co.il> wrote: 

		Hi Hal,
		 
		What is this "loopback" connector used for?
		Does not seem to me like a very useful thing to do.

	 
	Perhaps not but no reason OpenSM can't handle this more
gracefully.


		Anyway, if it is not a production environment we could
add a "debug mode" (-d flag option) to ignore this check.

	 
	Why would a separate flag be needed ?
	[EZ] Since I do not see any other solution for the SM  to know
it is really a loop back plug rather then two devices with same GUID
connected back to back ...
	 
	-- Hal


		 

		Eitan Zahavi 
		Senior Engineering Director, Software Architect 
		Mellanox Technologies LTD 
		Tel:+972-4-9097208
		Fax:+972-4-9593245 
		P.O. Box 586 Yokneam 20692 ISRAEL 

		 


________________________________

			From: Hal Rosenstock
[mailto:hal.rosenstock at gmail.com] 
			Sent: Tuesday, July 24, 2007 5:31 PM
			To: OpenFabrics General
			Cc: Sasha Khapyorsky; Eitan Zahavi; Yevgeny
Kliteynik
			Subject: OpenSM detection of duplicated GUIDs on
loopback
			
			 
			
			Hi,
			 
			This is what starts off as a "minor" issue and I
know it has been discussed it somewhat in the past: 
			 
			Putting a loopback connector on a (switch) link
causes OpenSM to indicate duplicated GUID error 0D18 as follows:
			
			__osm_ni_rcv_set_links
			{
			...
			          /*
			             When there are only two nodes with
exact same guids (connected back 
			             to back) - the previous check for
duplicated guid will not catch
			             them. But the link will be from the
port to itself...
			             Enhanced Port 0 is an exception to
this
			          */ 
			          if ((osm_node_get_node_guid( p_node )
== p_ni_context->node_guid) &&
			              (port_num ==
p_ni_context->port_num) &&
			              (port_num != 0))
			          {
			            osm_log( p_rcv->p_log,
OSM_LOG_ERROR, 
			                     "__osm_ni_rcv_set_links:
ERR 0D18: "
			                     "Duplicate GUID found by
link from a port to itself:"
			                     "node 0x%" PRIx64 ", port
number 0x%X\n", 
			                     cl_ntoh64(
osm_node_get_node_guid( p_node ) ),
			                     port_num );
			...
			
			So this occurs over and over and over and fills
the log with the same spew. This should be improved IMO. 
			
			Is this really a fatal condition ? Doesn't seem
like it should be to me. 
			 
			Also, OpenSM can "ride" this out with -y (stay
on fatal) but is that safe for this condition ?
			 
			Seems like something like an extra loopback bit
should be added to some port structure which should cause these links to
be ignored. This bit would then be reset when the peer is now longer
itself. 
			
			Also, is there a relationship of this with the
12x/duplicated GUID code ? 
			 
			Thanks.
			 
			-- Hal


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070724/524b0479/attachment.html>


More information about the general mailing list