[ofa-general] [PATCH 0/4] opensm: Unicast Routing Cache
Al Chu
chu11 at llnl.gov
Mon May 5 09:32:49 PDT 2008
Hey Yevgeny,
This looks like a great idea. But is there a reason its only supported
for LMC=0? Since the caching is handled at the ucast-mgr level (rather
than in the routing algorithm code), I don't quite see why LMC=0
matters.
Maybe it is b/c of future incremental routing on your todo? If that's
the case, instead of only caching when LMC=0, perhaps initial
incremental routing should only work under LMC=0. Later on incremental
routing for LMC > 0 could be added.
Al
On Sun, 2008-05-04 at 13:08 +0300, Yevgeny Kliteynik wrote:
> One thing I need to add here: ucast cache is currently supported
> for LMC=0 only.
>
> -- Yevgeny
>
> Yevgeny Kliteynik wrote:
> > Hi Sasha,
> >
> > The following series of 4 patches implements unicast routing cache
> > in OpenSM.
> >
> > None of the current routing engines is scalable when we're talking
> > about big clusters. On ~5K cluster with ~1.3K switches, it takes
> > about two minutes to calculate the routing. The problem is, each
> > time the routing is calculated from scratch.
> >
> > Incremental routing (which is on my to-do list) aims to address this
> > problem when there is some "local" change in fabric (e.g. single
> > switch failure, single link failure, link added, etc).
> > In such cases we can use the routing that was already calculated in
> > the previous heavy sweep, and then we just have to modify it according
> > to the change.
> >
> > For instance, if some switch has disappeared from the fabric, we can
> > use the routing that existed with this switch, take a step back from
> > this switch and see if it is possible to route all the lids that were
> > routed through this switch some other way (which is usually the case).
> >
> > To implement incremental routing, we need to create some kind of unicast
> > routing cache, which is what these patches implement. In addition to being
> > a step toward the incremental routing, routing cache is usefull by itself.
> >
> > This cache can save us routing calculation in case of change in the leaf
> > switches or in hosts. For instance, if some node is rebooted, OpenSM would
> > start a heavy sweep with full routing recalculation when the HCA is going
> > down, and another one when HCA is brought up, when in fact both of these
> > routing calculation can be replaced by using of unicast routing cache.
> >
> > Unicast routing cache comprises the following:
> > - Topology: a data structure with all the switches and CAs of the fabric
> > - LFTs: each switch has an LFT cached
> > - Lid matrices: each switch has lid matrices cached, which is needed for
> > multicast routing (which is not cached).
> >
> > There is a topology matching function that compares the current topology
> > with the cached one to find out whether the cache is usable (valid) or not.
> >
> > The cache is used the following way:
> > - SM is executed - it starts first routing calculation
> > - calculated routing is stored in the cache
> > - at some point new heavy sweep is triggered
> > - unicast manager checks whether the cache can be used instead
> > of new routing calculation.
> > In one of the following cases we can use cached routing
> > + there is no topology change
> > + one or more CAs disappeared (they exist in the cached topology
> > model, but missing in the newly discovered fabric)
> > + one or more leaf switches disappeared
> > In these cases cached routing is written to the switches as is
> > (unless the switch doesn't exist).
> > If there is any other topology change:
> > - existing cache is invalidated
> > - topology is cached
> > - routing is calculated as usual
> > - routing is cached
> >
> > My simulations show that when the usual routing phase of the heavy
> > sweep on the topology that I mentioned above takes ~2 minutes,
> > cached routing reduces this time to 6 seconds (which is nice, if you
> > ask me...).
> >
> > Of all the cases when the cache is valid, the most painful and
> > "complainable" case is when a compute node reboot (which happens pretty
> > often) causes two heavy sweeps with two full routing calculations.
> > Unicast Routing Cache is aimed to solve this problem (again, in addition
> > to being a step toward the incremental routing).
> >
> > -- Yevgeny
> > _______________________________________________
> > general mailing list
> > general at lists.openfabrics.org
> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> >
> > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> >
>
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
--
Albert Chu
chu11 at llnl.gov
925-422-5311
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory
More information about the general
mailing list