[ofa-general] Re: [PATCH] opensm: osm_state_mgr.c - stop idle queue processing if heavy sweep requested

Thu Dec 20 09:06:32 PST 2007

On 18:41 Thu 20 Dec     , Yevgeny Kliteynik wrote:
>  Sasha Khapyorsky wrote:
> > On 09:40 Wed 19 Dec     , Yevgeny Kliteynik wrote:
> >>  Sasha Khapyorsky wrote:
> >>> Hi Yevgeny,
> >>> On 15:33 Mon 17 Dec     , Yevgeny Kliteynik wrote:
> >>>> If a heavy sweep requested during idle queue processing, OSM continues
> >>>> to process it till the end and only then notices the heavy sweep 
> >>>> request.
> >>>> In some cases this might leave a topology change unhandled for several
> >>>> minutes.
> >>> Could you provide more details about such cases?
> >>> As far as I know the idle queue is used only for multicast re-routing.
> >>> If so, it is interesting by itself why it takes minutes and where. Is
> >>> where MCG join/leave storm?
> >>  Exactly. The problem was discovered on a big cluster with hundreds of 
> >> mcast  groups,
> >>  when there is some massive change in the subnet (like rebooting hundreds 
> >> of  nodes).
> > Ok, then proposed patch looks like half solution for me.
> > During mcast join/leave storm idle queue will be filled with requests to
> > rebuild mcast routing. OpenSM will process it one by one (and this will
> > take a lot of time) instead of process all pended mcast groups in one
> > run. I think it is first improvement needed here.
> > Even with such improvement we will not be able to control the order of
> > heavy sweep/mcast join requests, so basically idea of breaking idle
> > queue processing looks fine for me, but it is not all what should be
> > done here. Heavy sweep by itself recalculates mcast routing for all
> > existing groups, it should invalidate all pended mcast rerouting
> > requests instead of continuing idle queue processing after heavy
> > sweep. Make sense?
> 
>  OK, makes sense.
>  So bottom line, when breaking the idle queue processing because of immediate
>  sweep request, state manager should just purge the whole idle queue and then
>  start the new heavy sweep.

Yes, it is one patch, another expected patch for improving mcast
join requests/node reboot storm handling by OpenSM is recalculating mcast
routing for more than one mcast groups (actually I think requested mcast
groups should be queued in the list and mcast re-routing request merged
+ some trivial processor function in osm_mcast_mgr.c). Maybe whole idle
queue mechanism can be killed as useless, then this will impact heavy
sweep related patch.

Sasha