[openib-general] RE: OpenSM Routing Scalability Proposal

Hal Rosenstock halr at voltaire.com
Wed May 25 12:29:47 PDT 2005


On Wed, 2005-05-25 at 14:45, Eitan Zahavi wrote: 
> Hi Hal,
> 
> All your points are valid. Especially the ones regarding the
> incremental algorithm for routing. Please see below.

One other question:
Is there an impact of LMC > 0 on this ?

> Eitan Zahavi
> Design Technology Director
> Mellanox Technologies LTD
> Tel:+972-4-9097208
> Fax:+972-4-9593245
> P.O. Box 586 Yokneam 20692 ISRAEL
> 
> 
> > --Original Message--
> > From: Hal Rosenstock [mailto:halr at voltaire.com]
> > Sent: Wednesday, May 25, 2005 5:53 PM
> > To: Eitan Zahavi
> > Cc: 'openib-general at openib.org'
> > Subject: Re: [openib-general] RE: OpenSM Routing Scalability
> Proposal
> > 
> > Hi Eitan,
> > 
> > On Tue, 2005-05-17 at 10:53, Eitan Zahavi wrote:
> > > Hi All,
> > >
> > > This is an updated proposal document for your comments.
> > 
> > I finally got a chance to read this. Some comments below.
> > 
> > > The main change is in describing the need for preserving enough
> data
> > > to enable incremental routing algorithm.
> > 
> > I think incremental can help but presents some new issues.
> [EZ] Yes very true. I do not have a full algorithm in place that will
> cover all possible cases.
> > 
> > > So the actual proposal is to implement the algorithm described in
> > > section 4.3.
> > 
> > 4.1 (min hop) and 4.2 (up/down) are already implemented, right ?
> [EZ] No they are not. The current implementation uses MinHop tables
> etc.
> 

I'm not sure I'm following you. Are you saying min hop is implemented
and up/down isn't (just analyzed) ?

> > > EZ <<OpenSM Routing.pdf>>
> > 
> > It seems like there are 2 parts to 4.3:
> > 1. Min hop table per leaf switch rather than per LID
> > What are the savings for this ? Seems like in terms of memory, this
> is
> > something like a divisor of L times the number of LIDs per HCA port.
> [EZ] Yes. 
> > Of course, switch port 0s on non leaf switches need to be
> accomodated.
> [EZ] True.
> > 
> > 2. Incremental routing (5)
> > a. Subcase of 5 where there is no other link between 2 adjacent
> > switches. Is another way of stating this, examine next hop switches
> to
> > see if there is a path between the 2 original switches and keep
> > expanding the depth until 1 is found ? Couldn't this be worse from a
> > compute standpoint than rerouting everything depending on the
> topology
> > (the likelihood of another path between the 2 original switches) ?
> [EZ] If one knows which ports have changed this will be faster then
> full recalc.

Sure but if the depth keeps expanding because no path is found between
the switches which lost a trunk link between them, then isn't the
calculation done on an ever expanding horizon of switches ? That was the
case I was referring to.
 
> > b. 5 asks "How do we support topology changes line moving an HCA
> from
> > one Switch to another?" Also, what about a link moving from one
> switch
> > to another ? It seems that link down is handled, but nothing is done
> on
> > a link up. Doesn't there need to be incremental defined for links
> being
> > added ?
> [EZ] Yes. This is not even close to full algorithm.
> > 
> > c. Also, with incremental routing, it's unclear to me how the paths
> > found would compare with the ones which would be determined from the
> > full algorithm (from scratch). Also, would there be some point at
> which
> > the full routing would be retriggered ?
> [EZ] Good point.
> > 
> > d. Clearly, there are end node responsibilities here as well
> (whether
> > this is done incrementally or fully or something else).
> [EZ] Not sure what you mean.

I'm referring to path changes and their implications on connections.
 
> > 3. Persistency (6)
> > a. Full LFT storage (6.1) This presumes that the determination of a
> > topology change upon discovery is cheaper computationally than
> running
> > the routing. Has this been proven ? (I hope this is the case).
> [EZ] Comparing two graphs is O(Links).

OK. I hope not twice the memory is needed for this.

> > b. Root nodes storage (6.2) Are the root nodes determined by the
> routing
> > or supplied to the routing ? Are they different for unicast and
> > multicast ?
> [EZ] Both is true. If not provided by human extracted using
> heuristics.

By both is true, do you mean that the root nodes can either be
determined by the routing or supplied to the routing ?

Are the unicast and multicast roots the same or different ?

-- Hal




More information about the general mailing list