[ofa-general] RE: opensm: a bug in heavy sweep? - no LFT re-configuration

Eitan Zahavi eitan at mellanox.co.il
Mon Jul 23 10:59:21 PDT 2007


Hi Sasha, Hal,
 
I think I have an idea:
 
Since this is a specific switch that reported ChangeBit or Trap why
can't we just qualify that there was no change in the switch setup?
We could send PortInfo, SwitchInfo, LFT, MFT, SL2VL, VLArb, PKey queries
and make sure no change from previous state. Or we could simply enforce
last state by sending it over again ...
 

Eitan Zahavi 
Senior Engineering Director, Software Architect 
Mellanox Technologies LTD 
Tel:+972-4-9097208
Fax:+972-4-9593245 
P.O. Box 586 Yokneam 20692 ISRAEL 

 


________________________________

	From: Hal Rosenstock [mailto:hal.rosenstock at gmail.com] 
	Sent: Monday, July 23, 2007 6:31 PM
	To: Sasha Khapyorsky
	Cc: Eitan Zahavi; OPENIB; Yevgeny Kliteynik
	Subject: Re: opensm: a bug in heavy sweep? - no LFT
re-configuration
	
	
	Hi Sasha,
	
	
	On 7/22/07, Sasha Khapyorsky <sashak at voltaire.com> wrote:

		On 14:59 Sun 22 Jul     , Eitan Zahavi wrote:
		> Hi Sasha
		>
		> Let's assume someone has reset a switch on the fabric.

		> What would cause the SM to re-assign the LFT of that
switch?
		
		OpenSM will sweep and drop this switch and when switch
will back it will
		be initialized again. But if the reset was too fast
(relative to
		discovery), we can be in trouble (and maybe not only
with LFTs).
		
		> I assumed that there is a mechanism to do that.
		
		Not for "fast" switch reboot.
		
		Hmm, I think we could try to detect this by comparing 
		SwitchInfo:LinerFDBTop with current p_sw->max_lid_ho or
even by seeing
		that PortInfo:LID is not set.

	 
	Not sure about checking PortInfo:LID. Wouldn't that approach
need to be qualified by PortState (armed or active) ? LFTTop seems
better to me or perhaps a combination of the two but I may be missing
something.

	 

		Something like below:
		
		
		diff --git a/opensm/include/opensm/osm_switch.h
b/opensm/include/opensm/osm_switch.h 
		index 5b2b19e..62c072f 100644
		--- a/opensm/include/opensm/osm_switch.h
		+++ b/opensm/include/opensm/osm_switch.h
		@@ -112,6 +112,7 @@ typedef struct _osm_switch
		       osm_fwd_tbl_t                           fwd_tbl; 
		       osm_mcast_tbl_t
mcast_tbl;
		       uint32_t
discovery_count;
		+       unsigned
update_ft;
		       void                                    *priv; 
		} osm_switch_t;
		/*
		@@ -152,6 +153,10 @@ typedef struct _osm_switch
		*              during the current fabric sweep.  This
number is reset
		*              to zero at the start of a sweep.
		*
		+*      update_ft 
		+*              When set fwd tables will be updated
regardless to entry
		+*              values locally stored in fwd tables
images
		+*
		* SEE ALSO
		*      Switch object
		*********/
		diff --git a/opensm/opensm/osm_port_info_rcv.c
b/opensm/opensm/osm_port_info_rcv.c 
		index adece65..8bbbcac 100644
		--- a/opensm/opensm/osm_port_info_rcv.c
		+++ b/opensm/opensm/osm_port_info_rcv.c
		@@ -336,6 +336,9 @@ __osm_pi_rcv_process_switch_port(
		      break;
		    }
		  }
		+  else if (port_num == 0 && p_node->sw && 
		+           (!p_pi->base_lid ||
!p_pi->master_sm_base_lid))
		+    p_node->sw->update_ft = 1;
		
		  /*
		    Update the PortInfo attribute.
		diff --git a/opensm/opensm/osm_ucast_mgr.c
b/opensm/opensm/osm_ucast_mgr.c 
		index b44a3ba..03516ae 100644
		--- a/opensm/opensm/osm_ucast_mgr.c
		+++ b/opensm/opensm/osm_ucast_mgr.c
		@@ -811,7 +811,8 @@ osm_ucast_mgr_set_fwd_table(
		       osm_switch_get_fwd_tbl_block( p_sw, block_id_ho,
block ) ; 
		       block_id_ho++ )
		  {
		-    if (!memcmp(block, p_mgr->lft_buf + block_id_ho *
64, 64))
		+    if (!p_sw->update_ft &&
		+        !memcmp(block, p_mgr->lft_buf + block_id_ho *
64, 64))
		      continue; 
		
		    if( osm_log_is_active( p_mgr->p_log, OSM_LOG_DEBUG )
)
		@@ -850,6 +851,7 @@ osm_ucast_mgr_set_fwd_table(
		    }
		  }
		
		+  p_sw->update_ft = 0;
		  OSM_LOG_EXIT( p_mgr->p_log );
		}
		
		
		
		BTW what do you think is the best way to detect switch
power up? I
		didn't really find a strong requirement for at powerup
initialization of
		any suitable component.

	 
	Peer switch link state change is insufficient to differentiate
switch reboot from "normal" link up/down. There is no IB standard
indication of this. 

	 

		> Anyway, kill -HUP should flush out the state and
restart from scratch.
		
		Thinking more about it I'm not sure. Similar flush will
be required for 
		another "stored" components like pkey, sl2vl tables
etc.. So it is more
		than just "regular" heavy sweep, another signal or
option could be used
		for this, but OTOH it becomes very close to OpenSM
restarting.. 

	 
	Shouldn't this be automatic rather than requiring the admin to
issue a signal somehow ?
	 
	-- Hal
	 


		Sasha
		
		>
		>
		> Eitan
		>
		> > -----Original Message-----
		> > From: Sasha Khapyorsky [mailto: sashak at voltaire.com]
		> > Sent: Sunday, July 22, 2007 1:22 PM
		> > To: Eitan Zahavi
		> > Cc: OPENIB; hal.rosenstock at gmail.com ; Yevgeny
Kliteynik
		> > Subject: Re: opensm: a bug in heavy sweep? - no LFT
re-configuration
		> >
		> > Hi Eitan,
		> >
		> > On 09:36 Sun 22 Jul     , Eitan Zahavi wrote:
		> > > Hi Sasha 
		> > >
		> > > I am running some tests manually and apparently it
looks
		> > like I found
		> > > a bug. Here is the sequence of things:
		> > > 1. SM sweeps the fabric assign LFTs 
		> > > 2. I manually modify some LFTs (single entry now
marked
		> > UNREACHABLE 3.
		> > > I force some switch change bit to 1 or issue kill
-HUP 4. The SM
		> > > reports SUBNET UP 5. The modified LFT entry is
still 
		> > UNREACHABLE and
		> > > the path is broken
		> >
		> > Right, in most cases (unless OpenSM has its own
changes in
		> > the same LFT
		> > block) OpenSM will refer its own LFT image for
"need to update" 
		> > decision, so _manual_ changes will not trigger new
update.
		> > Rerunning OpenSM should help however.
		> >
		> > > It looks to me some optimization of routing does
not fully reroute 
		> > > unless some condition is met - but that condition
does not
		> > include the
		> > > above triggers listed in step 3.
		> >
		> > Rereading all fabrics LFTs by default seems to be
too 
		> > expensive operations. At least by default, if it is
real
		> > requirement this could be enforced manually, for
example when
		> > kill -HUP is used. Thoughts?
		> >
		> > Sasha
		> >
		


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070723/55acb245/attachment.html>


More information about the general mailing list