[openib-general] [PATCH] OpenSM/doc: Add current-routing.txt
Hal Rosenstock
halr at voltaire.com
Tue Jul 18 13:51:34 PDT 2006
OpenSM/doc: Add current-routing.txt
This document is a description of the current routing algorithms/methods
implemented by OpenSM. Written with copious help from Eitan Zahavi and
Sasha Khapyorsky
Signed-off-by: Hal Rosenstock <halr at voltaire.com>
Index: doc/current-routing.txt
===================================================================
--- doc/current-routing.txt (revision 0)
+++ doc/current-routing.txt (revision 0)
@@ -0,0 +1,147 @@
+Current OpenSM Routing
+7/18/06
+
+OpenSM offers two routing engines:
+
+1. Min Hop Algorithm - based on the minimum hops to each node where the
+path length is optimized.
+
+2. UPDN Unicast routing algorithm - also based on the minimum hops to each
+node, but it is constrained to ranking rules. This algorithm should be chosen
+if the subnet is not a pure Fat Tree, and deadlock may occur due to a
+loop in the subnet.
+
+OpenSM now also offers a file method which can load routes from a table. See
+modular-routing.txt for more information on this.
+
+The basic routing algorithm is comprised of two stages:
+1. MinHop matrix calculation
+ How many hops are required to get from each port to each LID ?
+ The algorithm to fill these tables is different if you run standard
+(min hop) or Up/Down.
+ For standard routing, a "relaxation" algorithm is used to propagate
+min hop from every destination LID through neighbor switches
+ For Up/Down routing, a BFS from every target is used. The BFS tracks link
+direction (up or down) and avoid steps that will perform up after a down
+step was used.
+
+2. Once MinHop matrices exist, each switch is visited and for each target LID a
+decision is made as to what port should be used to get to that LID.
+ This step is common to standard and Up/Down routing. Each port has a
+counter counting the number of target LIDs going through it.
+ When there are multiple alternative ports with same MinHop to a LID,
+the one with less previously assigned ports is selected.
+ If LMC > 0, more checks are added: Within each group of LIDs assigned to
+same target port,
+ a. use only ports which have same MinHop
+ b. first prefer the ones that go to different systemImageGuid (then
+the previous LID of the same LMC group)
+ c. if none - prefer those who go through other NodeGuid
+ d. fall back to the number of paths method (if all go to same node).
+
+
+Effect of Topology Changes
+
+OpenSM will preserve existing routing in any case where there is no change in
+the fabric switches unless the -r (--reassign_lids) option is specified.
+
+-r
+--reassign_lids
+ This option causes OpenSM to reassign LIDs to all
+ end nodes. Specifying -r on a running subnet
+ may disrupt subnet traffic.
+ Without -r, OpenSM attempts to preserve existing
+ LID assignments resolving multiple use of same LID.
+
+If a link is added or removed, OpenSM does not recalculate
+the routes that do not have to change. A route has to change
+if the port is no longer UP or no longer the MinHop. When routing changes
+are performed, the same algorithm for balancing the routes is invoked.
+
+In the case of using the file based routing, any topology changes are
+currently ignored The 'file' routing engine just loads the LFTs from the file
+specified, with no reaction to real topology. Obviously, this will not be able
+to recheck LIDs (by GUID) for disconnected nodes, and LFTs for non-existent
+switches will be skipped. Multicast is not affected by 'file' routing engine
+(this uses min hop tables).
+
+
+Min Hop Algorithm
+
+The Min Hop algorithm is invoked when neither UPDN or the file method are
+specified.
+
+The Min Hop algorithm is divided into two stages: computation of
+min-hop tables on every switch and LFT output port assignment. Link
+subscription is also equalized with the ability to override based on
+port GUID. The latter is supplied by:
+
+-i <equalize-ignore-guids-file>
+-ignore-guids <equalize-ignore-guids-file>
+ This option provides the means to define a set of ports
+ (by guids) that will be ignored by the link load
+ equalization algorithm.
+
+LMC awareness routes based on (remote) system or switch basis.
+
+
+Purpose of UPDN Algorithm
+
+The UPDN algorithm is designed to prevent deadlocks from occurring in loops
+of the subnet. A loop-deadlock is a situation in which it is no longer
+possible to send data between any two hosts connected through the loop. As
+such, the UPDN routing algorithm should be used if the subnet is not a pure
+Fat Tree, and one of its loops may experience a deadlock (due, for example,
+to high pressure).
+
+The UPDN algorithm is based on the following main stages:
+
+1. Auto-detect root nodes - based on the HCA hop length from any switch in
+the subnet, a statistical histogram is built for each switch (hop num vs
+number of occurrences). If the histogram reflects a specific column (higher
+than others) for a certain node, then it is marked as a root node. Since
+the algorithm is statistical, it may not find any root nodes. The list of
+the root nodes found by this auto-detect stage is used by the ranking
+process stage.
+
+ Note 1: The user can override the node list manually.
+ Note 2: If this stage cannot find any root nodes, and the user did not
+ specify a guid list file, OpenSM defaults back to the Min Hop
+ routing algorithm.
+
+2. Ranking process - All root switch nodes (found in stage 1) are assigned
+a rank of 0. Using the BFS algorithm, the rest of the switch nodes in the
+subnet are ranked incrementally. This ranking aids in the process of enforcing
+rules that ensure loop-free paths.
+
+3. Min Hop Table setting - after ranking is done, a BFS algorithm is run from
+each (HCA or switch) node in the subnet. During the BFS process, the FDB table
+of each switch node traversed by BFS is updated, in reference to the starting
+node, based on the ranking rules and guid values.
+
+At the end of the process, the updated FDB tables ensure loop-free paths
+through the subnet.
+
+
+UPDN Algorithm Usage
+
+Activation through OpenSM
+
+Use '-R updn' option (instead of old '-u') to activate the UPDN algorithm.
+Use `-a <guid_list_file>' for adding an UPDN guid file that contains the
+root nodes for ranking.
+If the `-a' option is not used, OpenSM uses its auto-detect root nodes
+algorithm.
+
+Notes on the guid list file:
+1. A valid guid file specifies one guid in each line. Lines with an invalid
+format will be discarded.
+2. The user should specify the root switch guids. However, it is also
+possible to specify HCA guids; OpenSM will use the guid of the switch (if
+it exists) that connects the HCA to the subnet as a root node.
+
+
+To learn more about deadlock-free routing, see the article
+"Deadlock Free Message Routing in Multiprocessor Interconnection Networks"
+by William J Dally and Charles L Seitz (1985).
+
More information about the general
mailing list