[openib-general] RE: OpenSM Routing Scalability Proposal

Wed May 25 07:53:29 PDT 2005

Hi Eitan,

On Tue, 2005-05-17 at 10:53, Eitan Zahavi wrote: 
> Hi All,
> 
> This is an updated proposal document for your comments.

I finally got a chance to read this. Some comments below.

> The main change is in describing the need for preserving enough data
> to enable incremental routing algorithm. 

I think incremental can help but presents some new issues.

> So the actual proposal is to implement the algorithm described in
> section 4.3.

4.1 (min hop) and 4.2 (up/down) are already implemented, right ?

> EZ <<OpenSM Routing.pdf>> 

It seems like there are 2 parts to 4.3:
1. Min hop table per leaf switch rather than per LID
What are the savings for this ? Seems like in terms of memory, this is
something like a divisor of L times the number of LIDs per HCA port.
Of course, switch port 0s on non leaf switches need to be accomodated.

2. Incremental routing (5)
a. Subcase of 5 where there is no other link between 2 adjacent
switches. Is another way of stating this, examine next hop switches to
see if there is a path between the 2 original switches and keep
expanding the depth until 1 is found ? Couldn't this be worse from a
compute standpoint than rerouting everything depending on the topology
(the likelihood of another path between the 2 original switches) ?

b. 5 asks "How do we support topology changes line moving an HCA from
one Switch to another?" Also, what about a link moving from one switch
to another ? It seems that link down is handled, but nothing is done on
a link up. Doesn't there need to be incremental defined for links being
added ?

c. Also, with incremental routing, it's unclear to me how the paths
found would compare with the ones which would be determined from the
full algorithm (from scratch). Also, would there be some point at which
the full routing would be retriggered ? 

d. Clearly, there are end node responsibilities here as well (whether
this is done incrementally or fully or something else). 

3. Persistency (6)
a. Full LFT storage (6.1) This presumes that the determination of a
topology change upon discovery is cheaper computationally than running
the routing. Has this been proven ? (I hope this is the case).
b. Root nodes storage (6.2) Are the root nodes determined by the routing
or supplied to the routing ? Are they different for unicast and
multicast ?

-- Hal