<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">

<HTML>

<HEAD>

<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=us-ascii">

<META NAME="Generator" CONTENT="MS Exchange Server version 5.5.2654.45">

<TITLE>RE: [openib-general] RE: OpenSM Routing Scalability Proposal</TITLE>

</HEAD>

<BODY>


<P><FONT SIZE=2>> </FONT>

<BR><FONT SIZE=2>> On Wed, 2005-05-25 at 14:45, Eitan Zahavi wrote:</FONT>

<BR><FONT SIZE=2>> > Hi Hal,</FONT>

<BR><FONT SIZE=2>> ></FONT>

<BR><FONT SIZE=2>> > All your points are valid. Especially the ones regarding the</FONT>

<BR><FONT SIZE=2>> > incremental algorithm for routing. Please see below.</FONT>

<BR><FONT SIZE=2>> </FONT>

<BR><FONT SIZE=2>> One other question:</FONT>

<BR><FONT SIZE=2>> Is there an impact of LMC > 0 on this ?</FONT>

</P>


<P><FONT SIZE=2>[EZ] If LMC > 0 then the proposed algorithm for calculating min hop tables (step1) is going to be even faster then today implementation.</FONT></P>


<P><FONT SIZE=2>> </FONT>

<BR><FONT SIZE=2>> > ></FONT>

<BR><FONT SIZE=2>> > > Hi Eitan,</FONT>

<BR><FONT SIZE=2>> > ></FONT>

<BR><FONT SIZE=2>> > > On Tue, 2005-05-17 at 10:53, Eitan Zahavi wrote:</FONT>

<BR><FONT SIZE=2>> > > > Hi All,</FONT>

<BR><FONT SIZE=2>> > > ></FONT>

<BR><FONT SIZE=2>> > > > This is an updated proposal document for your comments.</FONT>

<BR><FONT SIZE=2>> > ></FONT>

<BR><FONT SIZE=2>> > > I finally got a chance to read this. Some comments below.</FONT>

<BR><FONT SIZE=2>> > ></FONT>

<BR><FONT SIZE=2>> > > > The main change is in describing the need for preserving enough</FONT>

<BR><FONT SIZE=2>> > data</FONT>

<BR><FONT SIZE=2>> > > > to enable incremental routing algorithm.</FONT>

<BR><FONT SIZE=2>> > ></FONT>

<BR><FONT SIZE=2>> > > I think incremental can help but presents some new issues.</FONT>

<BR><FONT SIZE=2>> > [EZ] Yes very true. I do not have a full algorithm in place that will</FONT>

<BR><FONT SIZE=2>> > cover all possible cases.</FONT>

<BR><FONT SIZE=2>> > ></FONT>

<BR><FONT SIZE=2>> > > > So the actual proposal is to implement the algorithm described in</FONT>

<BR><FONT SIZE=2>> > > > section 4.3.</FONT>

<BR><FONT SIZE=2>> > ></FONT>

<BR><FONT SIZE=2>> > > 4.1 (min hop) and 4.2 (up/down) are already implemented, right ?</FONT>

<BR><FONT SIZE=2>> > [EZ] No they are not. The current implementation uses MinHop tables</FONT>

<BR><FONT SIZE=2>> > etc.</FONT>

<BR><FONT SIZE=2>> ></FONT>

<BR><FONT SIZE=2>> </FONT>

<BR><FONT SIZE=2>> I'm not sure I'm following you. Are you saying min hop is implemented</FONT>

<BR><FONT SIZE=2>> and up/down isn't (just analyzed) ?</FONT>

<BR><FONT SIZE=2>[EZ] No - I use the term min hop for the first stage of the routing.</FONT>

<BR><FONT SIZE=2>Today this first stage generate a different kind of table then the proposed and it does so using a different traversal algorithm.</FONT></P>


<P><FONT SIZE=2>> </FONT>

<BR><FONT SIZE=2>> > > > EZ <<OpenSM Routing.pdf>></FONT>

<BR><FONT SIZE=2>> > ></FONT>

<BR><FONT SIZE=2>> > > It seems like there are 2 parts to 4.3:</FONT>

<BR><FONT SIZE=2>> > > 1. Min hop table per leaf switch rather than per LID</FONT>

<BR><FONT SIZE=2>> > > What are the savings for this ? Seems like in terms of memory, this</FONT>

<BR><FONT SIZE=2>> > is</FONT>

<BR><FONT SIZE=2>> > > something like a divisor of L times the number of LIDs per HCA port.</FONT>

<BR><FONT SIZE=2>> > [EZ] Yes.</FONT>

<BR><FONT SIZE=2>> > > Of course, switch port 0s on non leaf switches need to be</FONT>

<BR><FONT SIZE=2>> > accomodated.</FONT>

<BR><FONT SIZE=2>> > [EZ] True.</FONT>

<BR><FONT SIZE=2>> > ></FONT>

<BR><FONT SIZE=2>> > > 2. Incremental routing (5)</FONT>

<BR><FONT SIZE=2>> > > a. Subcase of 5 where there is no other link between 2 adjacent</FONT>

<BR><FONT SIZE=2>> > > switches. Is another way of stating this, examine next hop switches</FONT>

<BR><FONT SIZE=2>> > to</FONT>

<BR><FONT SIZE=2>> > > see if there is a path between the 2 original switches and keep</FONT>

<BR><FONT SIZE=2>> > > expanding the depth until 1 is found ? Couldn't this be worse from a</FONT>

<BR><FONT SIZE=2>> > > compute standpoint than rerouting everything depending on the</FONT>

<BR><FONT SIZE=2>> > topology</FONT>

<BR><FONT SIZE=2>> > > (the likelihood of another path between the 2 original switches) ?</FONT>

<BR><FONT SIZE=2>> > [EZ] If one knows which ports have changed this will be faster then</FONT>

<BR><FONT SIZE=2>> > full recalc.</FONT>

<BR><FONT SIZE=2>> </FONT>

<BR><FONT SIZE=2>> Sure but if the depth keeps expanding because no path is found between</FONT>

<BR><FONT SIZE=2>> the switches which lost a trunk link between them, then isn't the</FONT>

<BR><FONT SIZE=2>> calculation done on an ever expanding horizon of switches ? That was the</FONT>

<BR><FONT SIZE=2>> case I was referring to.</FONT>

<BR><FONT SIZE=2>[EZ] But not all switches needs to be recomputed.</FONT>

<BR><FONT SIZE=2>> </FONT>

<BR><FONT SIZE=2>> > > b. 5 asks "How do we support topology changes line moving an HCA</FONT>

<BR><FONT SIZE=2>> > from</FONT>

<BR><FONT SIZE=2>> > > one Switch to another?" Also, what about a link moving from one</FONT>

<BR><FONT SIZE=2>> > switch</FONT>

<BR><FONT SIZE=2>> > > to another ? It seems that link down is handled, but nothing is done</FONT>

<BR><FONT SIZE=2>> > on</FONT>

<BR><FONT SIZE=2>> > > a link up. Doesn't there need to be incremental defined for links</FONT>

<BR><FONT SIZE=2>> > being</FONT>

<BR><FONT SIZE=2>> > > added ?</FONT>

<BR><FONT SIZE=2>> > [EZ] Yes. This is not even close to full algorithm.</FONT>

<BR><FONT SIZE=2>> > ></FONT>

<BR><FONT SIZE=2>> > > c. Also, with incremental routing, it's unclear to me how the paths</FONT>

<BR><FONT SIZE=2>> > > found would compare with the ones which would be determined from the</FONT>

<BR><FONT SIZE=2>> > > full algorithm (from scratch). Also, would there be some point at</FONT>

<BR><FONT SIZE=2>> > which</FONT>

<BR><FONT SIZE=2>> > > the full routing would be retriggered ?</FONT>

<BR><FONT SIZE=2>> > [EZ] Good point.</FONT>

<BR><FONT SIZE=2>> > ></FONT>

<BR><FONT SIZE=2>> > > d. Clearly, there are end node responsibilities here as well</FONT>

<BR><FONT SIZE=2>> > (whether</FONT>

<BR><FONT SIZE=2>> > > this is done incrementally or fully or something else).</FONT>

<BR><FONT SIZE=2>> > [EZ] Not sure what you mean.</FONT>

<BR><FONT SIZE=2>> </FONT>

<BR><FONT SIZE=2>> I'm referring to path changes and their implications on connections.</FONT>

<BR><FONT SIZE=2>[EZ] I assume the QP has already timed out.</FONT>

<BR><FONT SIZE=2>> </FONT>

<BR><FONT SIZE=2>> > > 3. Persistency (6)</FONT>

<BR><FONT SIZE=2>> > > a. Full LFT storage (6.1) This presumes that the determination of a</FONT>

<BR><FONT SIZE=2>> > > topology change upon discovery is cheaper computationally than</FONT>

<BR><FONT SIZE=2>> > running</FONT>

<BR><FONT SIZE=2>> > > the routing. Has this been proven ? (I hope this is the case).</FONT>

<BR><FONT SIZE=2>> > [EZ] Comparing two graphs is O(Links).</FONT>

<BR><FONT SIZE=2>> </FONT>

<BR><FONT SIZE=2>> OK. I hope not twice the memory is needed for this.</FONT>

<BR><FONT SIZE=2>[EZ] The memory involved with keeping the connectivity is small compared to the routing data and various other tables (PKey SL2VL...)</FONT></P>


<P><FONT SIZE=2>> </FONT>

<BR><FONT SIZE=2>> > > b. Root nodes storage (6.2) Are the root nodes determined by the</FONT>

<BR><FONT SIZE=2>> > routing</FONT>

<BR><FONT SIZE=2>> > > or supplied to the routing ? Are they different for unicast and</FONT>

<BR><FONT SIZE=2>> > > multicast ?</FONT>

<BR><FONT SIZE=2>> > [EZ] Both is true. If not provided by human extracted using</FONT>

<BR><FONT SIZE=2>> > heuristics.</FONT>

<BR><FONT SIZE=2>> </FONT>

<BR><FONT SIZE=2>> By both is true, do you mean that the root nodes can either be</FONT>

<BR><FONT SIZE=2>> determined by the routing or supplied to the routing ?</FONT>

<BR><FONT SIZE=2>[EZ] OpenSM can take roots from file or calculate them using some heuristics.</FONT>

<BR><FONT SIZE=2>> </FONT>

<BR><FONT SIZE=2>> Are the unicast and multicast roots the same or different ?</FONT>

<BR><FONT SIZE=2>[EZ] Multicast roots are calculated only. If you have a small group then the roots will be the lowest level in the tree that fits all the members.</FONT></P>


<P><FONT SIZE=2>> </FONT>

<BR><FONT SIZE=2>> -- Hal</FONT>

</P>


</BODY>

</HTML>