[openib-general] RE: OpenSM Routing Scalability Proposal

Eitan Zahavi eitan at mellanox.co.il
Wed May 25 12:49:10 PDT 2005


> 
> On Wed, 2005-05-25 at 14:45, Eitan Zahavi wrote:
> > Hi Hal,
> >
> > All your points are valid. Especially the ones regarding the
> > incremental algorithm for routing. Please see below.
> 
> One other question:
> Is there an impact of LMC > 0 on this ?

[EZ] If LMC > 0 then the proposed algorithm for calculating min hop tables
(step1) is going to be even faster then today implementation.
> 
> > >
> > > Hi Eitan,
> > >
> > > On Tue, 2005-05-17 at 10:53, Eitan Zahavi wrote:
> > > > Hi All,
> > > >
> > > > This is an updated proposal document for your comments.
> > >
> > > I finally got a chance to read this. Some comments below.
> > >
> > > > The main change is in describing the need for preserving enough
> > data
> > > > to enable incremental routing algorithm.
> > >
> > > I think incremental can help but presents some new issues.
> > [EZ] Yes very true. I do not have a full algorithm in place that will
> > cover all possible cases.
> > >
> > > > So the actual proposal is to implement the algorithm described in
> > > > section 4.3.
> > >
> > > 4.1 (min hop) and 4.2 (up/down) are already implemented, right ?
> > [EZ] No they are not. The current implementation uses MinHop tables
> > etc.
> >
> 
> I'm not sure I'm following you. Are you saying min hop is implemented
> and up/down isn't (just analyzed) ?
[EZ] No - I use the term min hop for the first stage of the routing.
Today this first stage generate a different kind of table then the proposed
and it does so using a different traversal algorithm.
> 
> > > > EZ <<OpenSM Routing.pdf>>
> > >
> > > It seems like there are 2 parts to 4.3:
> > > 1. Min hop table per leaf switch rather than per LID
> > > What are the savings for this ? Seems like in terms of memory, this
> > is
> > > something like a divisor of L times the number of LIDs per HCA port.
> > [EZ] Yes.
> > > Of course, switch port 0s on non leaf switches need to be
> > accomodated.
> > [EZ] True.
> > >
> > > 2. Incremental routing (5)
> > > a. Subcase of 5 where there is no other link between 2 adjacent
> > > switches. Is another way of stating this, examine next hop switches
> > to
> > > see if there is a path between the 2 original switches and keep
> > > expanding the depth until 1 is found ? Couldn't this be worse from a
> > > compute standpoint than rerouting everything depending on the
> > topology
> > > (the likelihood of another path between the 2 original switches) ?
> > [EZ] If one knows which ports have changed this will be faster then
> > full recalc.
> 
> Sure but if the depth keeps expanding because no path is found between
> the switches which lost a trunk link between them, then isn't the
> calculation done on an ever expanding horizon of switches ? That was the
> case I was referring to.
[EZ] But not all switches needs to be recomputed.
> 
> > > b. 5 asks "How do we support topology changes line moving an HCA
> > from
> > > one Switch to another?" Also, what about a link moving from one
> > switch
> > > to another ? It seems that link down is handled, but nothing is done
> > on
> > > a link up. Doesn't there need to be incremental defined for links
> > being
> > > added ?
> > [EZ] Yes. This is not even close to full algorithm.
> > >
> > > c. Also, with incremental routing, it's unclear to me how the paths
> > > found would compare with the ones which would be determined from the
> > > full algorithm (from scratch). Also, would there be some point at
> > which
> > > the full routing would be retriggered ?
> > [EZ] Good point.
> > >
> > > d. Clearly, there are end node responsibilities here as well
> > (whether
> > > this is done incrementally or fully or something else).
> > [EZ] Not sure what you mean.
> 
> I'm referring to path changes and their implications on connections.
[EZ] I assume the QP has already timed out.
> 
> > > 3. Persistency (6)
> > > a. Full LFT storage (6.1) This presumes that the determination of a
> > > topology change upon discovery is cheaper computationally than
> > running
> > > the routing. Has this been proven ? (I hope this is the case).
> > [EZ] Comparing two graphs is O(Links).
> 
> OK. I hope not twice the memory is needed for this.
[EZ] The memory involved with keeping the connectivity is small compared to
the routing data and various other tables (PKey SL2VL...)
> 
> > > b. Root nodes storage (6.2) Are the root nodes determined by the
> > routing
> > > or supplied to the routing ? Are they different for unicast and
> > > multicast ?
> > [EZ] Both is true. If not provided by human extracted using
> > heuristics.
> 
> By both is true, do you mean that the root nodes can either be
> determined by the routing or supplied to the routing ?
[EZ] OpenSM can take roots from file or calculate them using some
heuristics.
> 
> Are the unicast and multicast roots the same or different ?
[EZ] Multicast roots are calculated only. If you have a small group then the
roots will be the lowest level in the tree that fits all the members.
> 
> -- Hal
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20050525/f5d34362/attachment.html>


More information about the general mailing list