[ofa-general] Re: [PATCH] opensm: osm_state_mgr.c - stop idle queue processing if heavy sweep requested

Sasha Khapyorsky sashak at voltaire.com
Thu Dec 20 07:43:01 PST 2007


On 09:40 Wed 19 Dec     , Yevgeny Kliteynik wrote:
>  Sasha Khapyorsky wrote:
> > Hi Yevgeny,
> > On 15:33 Mon 17 Dec     , Yevgeny Kliteynik wrote:
> >> If a heavy sweep requested during idle queue processing, OSM continues
> >> to process it till the end and only then notices the heavy sweep request.
> >> In some cases this might leave a topology change unhandled for several
> >> minutes.
> > Could you provide more details about such cases?
> > As far as I know the idle queue is used only for multicast re-routing.
> > If so, it is interesting by itself why it takes minutes and where. Is
> > where MCG join/leave storm?
> 
>  Exactly. The problem was discovered on a big cluster with hundreds of mcast 
>  groups,
>  when there is some massive change in the subnet (like rebooting hundreds of 
>  nodes).

Ok, then proposed patch looks like half solution for me.

During mcast join/leave storm idle queue will be filled with requests to
rebuild mcast routing. OpenSM will process it one by one (and this will
take a lot of time) instead of process all pended mcast groups in one
run. I think it is first improvement needed here.

Even with such improvement we will not be able to control the order of
heavy sweep/mcast join requests, so basically idea of breaking idle
queue processing looks fine for me, but it is not all what should be
done here. Heavy sweep by itself recalculates mcast routing for all
existing groups, it should invalidate all pended mcast rerouting
requests instead of continuing idle queue processing after heavy
sweep. Make sense?

Sasha

> 
>  -- Yevgeny
> 
> > Or single re-routing cycle takes minutes?
> > Sasha
> >> Signed-off-by:  Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>
> >> ---
> >>  opensm/opensm/osm_state_mgr.c |   31 ++++++++++++++++++++++++-------
> >>  1 files changed, 24 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/opensm/opensm/osm_state_mgr.c b/opensm/opensm/osm_state_mgr.c
> >> index 5c39f11..6ee5ee6 100644
> >> --- a/opensm/opensm/osm_state_mgr.c
> >> +++ b/opensm/opensm/osm_state_mgr.c
> >> @@ -1607,13 +1607,30 @@ void osm_state_mgr_process(IN osm_state_mgr_t * 
> >> const p_mgr,
> >>  				/* CALL the done function */
> >>  				__process_idle_time_queue_done(p_mgr);
> >>
> >> -				/*
> >> -				 * Set the signal to OSM_SIGNAL_IDLE_TIME_PROCESS
> >> -				 * so that the next element in the queue gets processed
> >> -				 */
> >> -
> >> -				signal = OSM_SIGNAL_IDLE_TIME_PROCESS;
> >> -				p_mgr->state = OSM_SM_STATE_PROCESS_REQUEST;
> >> +				if (p_mgr->p_subn->force_immediate_heavy_sweep) {
> >> +					/*
> >> +					 * Do not read next item from the idle queue.
> >> +					 * Immediate heavy sweep is requested, so it's
> >> +					 * more important.
> >> +					 * Besides, there is a chance that after the
> >> +					 * heavy sweep complition, idle queue processing
> >> +					 * that SM would have performed here will be obsolete.
> >> +					 */
> >> +					if (osm_log_is_active(p_mgr->p_log, OSM_LOG_DEBUG))
> >> +						osm_log(p_mgr->p_log, OSM_LOG_DEBUG,
> >> +						"osm_state_mgr_process: "
> >> +						"interrupting idle time queue processing - heavy sweep 
> >> requested\n");
> >> +					signal = OSM_SIGNAL_NONE:
> >> +					p_mgr->state = OSM_SM_STATE_IDLE;
> >> +				}
> >> +				else {
> >> +					/*
> >> +					 * Set the signal to OSM_SIGNAL_IDLE_TIME_PROCESS
> >> +					 * so that the next element in the queue gets processed
> >> +					 */
> >> +					signal = OSM_SIGNAL_IDLE_TIME_PROCESS;
> >> +					p_mgr->state = OSM_SM_STATE_PROCESS_REQUEST;
> >> +				}
> >>  				break;
> >>
> >>  			default:
> >> -- 
> >> 1.5.1.4
> >>
> 



More information about the general mailing list