[ofa-general] Re: [PATCH] opensm: osm_state_mgr.c - stop idle queue processing if heavy sweep requested

Yevgeny Kliteynik kliteyn at dev.mellanox.co.il
Thu Dec 20 08:41:51 PST 2007


Sasha Khapyorsky wrote:
> On 09:40 Wed 19 Dec     , Yevgeny Kliteynik wrote:
>>  Sasha Khapyorsky wrote:
>>> Hi Yevgeny,
>>> On 15:33 Mon 17 Dec     , Yevgeny Kliteynik wrote:
>>>> If a heavy sweep requested during idle queue processing, OSM continues
>>>> to process it till the end and only then notices the heavy sweep request.
>>>> In some cases this might leave a topology change unhandled for several
>>>> minutes.
>>> Could you provide more details about such cases?
>>> As far as I know the idle queue is used only for multicast re-routing.
>>> If so, it is interesting by itself why it takes minutes and where. Is
>>> where MCG join/leave storm?
>>  Exactly. The problem was discovered on a big cluster with hundreds of mcast 
>>  groups,
>>  when there is some massive change in the subnet (like rebooting hundreds of 
>>  nodes).
> 
> Ok, then proposed patch looks like half solution for me.
> 
> During mcast join/leave storm idle queue will be filled with requests to
> rebuild mcast routing. OpenSM will process it one by one (and this will
> take a lot of time) instead of process all pended mcast groups in one
> run. I think it is first improvement needed here.
> 
> Even with such improvement we will not be able to control the order of
> heavy sweep/mcast join requests, so basically idea of breaking idle
> queue processing looks fine for me, but it is not all what should be
> done here. Heavy sweep by itself recalculates mcast routing for all
> existing groups, it should invalidate all pended mcast rerouting
> requests instead of continuing idle queue processing after heavy
> sweep. Make sense?

OK, makes sense.
So bottom line, when breaking the idle queue processing because of immediate
sweep request, state manager should just purge the whole idle queue and then
start the new heavy sweep.

I'll work on it.

-- Yevgeny

> Sasha
> 
>>  -- Yevgeny
>>
>>> Or single re-routing cycle takes minutes?
>>> Sasha
>>>> Signed-off-by:  Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>
>>>> ---
>>>>  opensm/opensm/osm_state_mgr.c |   31 ++++++++++++++++++++++++-------
>>>>  1 files changed, 24 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/opensm/opensm/osm_state_mgr.c b/opensm/opensm/osm_state_mgr.c
>>>> index 5c39f11..6ee5ee6 100644
>>>> --- a/opensm/opensm/osm_state_mgr.c
>>>> +++ b/opensm/opensm/osm_state_mgr.c
>>>> @@ -1607,13 +1607,30 @@ void osm_state_mgr_process(IN osm_state_mgr_t * 
>>>> const p_mgr,
>>>>  				/* CALL the done function */
>>>>  				__process_idle_time_queue_done(p_mgr);
>>>>
>>>> -				/*
>>>> -				 * Set the signal to OSM_SIGNAL_IDLE_TIME_PROCESS
>>>> -				 * so that the next element in the queue gets processed
>>>> -				 */
>>>> -
>>>> -				signal = OSM_SIGNAL_IDLE_TIME_PROCESS;
>>>> -				p_mgr->state = OSM_SM_STATE_PROCESS_REQUEST;
>>>> +				if (p_mgr->p_subn->force_immediate_heavy_sweep) {
>>>> +					/*
>>>> +					 * Do not read next item from the idle queue.
>>>> +					 * Immediate heavy sweep is requested, so it's
>>>> +					 * more important.
>>>> +					 * Besides, there is a chance that after the
>>>> +					 * heavy sweep complition, idle queue processing
>>>> +					 * that SM would have performed here will be obsolete.
>>>> +					 */
>>>> +					if (osm_log_is_active(p_mgr->p_log, OSM_LOG_DEBUG))
>>>> +						osm_log(p_mgr->p_log, OSM_LOG_DEBUG,
>>>> +						"osm_state_mgr_process: "
>>>> +						"interrupting idle time queue processing - heavy sweep 
>>>> requested\n");
>>>> +					signal = OSM_SIGNAL_NONE:
>>>> +					p_mgr->state = OSM_SM_STATE_IDLE;
>>>> +				}
>>>> +				else {
>>>> +					/*
>>>> +					 * Set the signal to OSM_SIGNAL_IDLE_TIME_PROCESS
>>>> +					 * so that the next element in the queue gets processed
>>>> +					 */
>>>> +					signal = OSM_SIGNAL_IDLE_TIME_PROCESS;
>>>> +					p_mgr->state = OSM_SM_STATE_PROCESS_REQUEST;
>>>> +				}
>>>>  				break;
>>>>
>>>>  			default:
>>>> -- 
>>>> 1.5.1.4
>>>>
> 




More information about the general mailing list