[openib-general] [PATCH] OpenSM/doc: Add current-routing.txt

Hal Rosenstock halr at voltaire.com
Tue Jul 18 13:51:34 PDT 2006


OpenSM/doc: Add current-routing.txt

This document is a description of the current routing algorithms/methods
implemented by OpenSM. Written with copious help from Eitan Zahavi and
Sasha Khapyorsky

Signed-off-by: Hal Rosenstock <halr at voltaire.com>

Index: doc/current-routing.txt
===================================================================
--- doc/current-routing.txt	(revision 0)
+++ doc/current-routing.txt	(revision 0)
@@ -0,0 +1,147 @@
+Current OpenSM Routing
+7/18/06
+
+OpenSM offers two routing engines:
+
+1.  Min Hop Algorithm - based on the minimum hops to each node where the 
+path length is optimized.
+
+2.  UPDN Unicast routing algorithm - also based on the minimum hops to each 
+node, but it is constrained to ranking rules. This algorithm should be chosen 
+if the subnet is not a pure Fat Tree, and deadlock may occur due to a 
+loop in the subnet.
+
+OpenSM now also offers a file method which can load routes from a table. See 
+modular-routing.txt for more information on this.
+
+The basic routing algorithm is comprised of two stages:
+1. MinHop matrix calculation
+   How many hops are required to get from each port to each LID ?
+   The algorithm to fill these tables is different if you run standard
+(min hop) or Up/Down.
+   For standard routing, a "relaxation" algorithm is used to propagate
+min hop from every destination LID through neighbor switches
+   For Up/Down routing, a BFS from every target is used. The BFS tracks link
+direction (up or down) and avoid steps that will perform up after a down
+step was used.
+
+2. Once MinHop matrices exist, each switch is visited and for each target LID a
+decision is made as to what port should be used to get to that LID.
+   This step is common to standard and Up/Down routing. Each port has a
+counter counting the number of target LIDs going through it.
+   When there are multiple alternative ports with same MinHop to a LID,
+the one with less previously assigned ports is selected. 
+   If LMC > 0, more checks are added: Within each group of LIDs assigned to 
+same target port, 
+   a. use only ports which have same MinHop 
+   b. first prefer the ones that go to different systemImageGuid (then
+the previous LID of the same LMC group)
+   c. if none - prefer those who go through other NodeGuid
+   d. fall back to the number of paths method (if all go to same node).
+
+
+Effect of Topology Changes
+
+OpenSM will preserve existing routing in any case where there is no change in
+the fabric switches unless the -r (--reassign_lids) option is specified.
+
+-r
+--reassign_lids
+          This option causes OpenSM to reassign LIDs to all
+          end nodes. Specifying -r on a running subnet
+          may disrupt subnet traffic.
+          Without -r, OpenSM attempts to preserve existing
+          LID assignments resolving multiple use of same LID.
+
+If a link is added or removed, OpenSM does not recalculate
+the routes that do not have to change. A route has to change 
+if the port is no longer UP or no longer the MinHop. When routing changes 
+are performed, the same algorithm for balancing the routes is invoked.
+
+In the case of using the file based routing, any topology changes are
+currently ignored The 'file' routing engine just loads the LFTs from the file 
+specified, with no reaction to real topology. Obviously, this will not be able 
+to recheck LIDs (by GUID) for disconnected nodes, and LFTs for non-existent 
+switches will be skipped. Multicast is not affected by 'file' routing engine 
+(this uses min hop tables).
+
+
+Min Hop Algorithm
+
+The Min Hop algorithm is invoked when neither UPDN or the file method are
+specified.
+ 
+The Min Hop algorithm is divided into two stages: computation of
+min-hop tables on every switch and LFT output port assignment. Link
+subscription is also equalized with the ability to override based on
+port GUID. The latter is supplied by:
+
+-i <equalize-ignore-guids-file>
+-ignore-guids <equalize-ignore-guids-file>
+          This option provides the means to define a set of ports
+          (by guids) that will be ignored by the link load
+          equalization algorithm.
+
+LMC awareness routes based on (remote) system or switch basis.
+
+
+Purpose of UPDN Algorithm
+
+The UPDN algorithm is designed to prevent deadlocks from occurring in loops 
+of the subnet. A loop-deadlock is a situation in which it is no longer 
+possible to send data between any two hosts connected through the loop. As 
+such, the UPDN routing algorithm should be used if the subnet is not a pure 
+Fat Tree, and one of its loops may experience a deadlock (due, for example, 
+to high pressure).
+
+The UPDN algorithm is based on the following main stages:
+
+1.  Auto-detect root nodes - based on the HCA hop length from any switch in 
+the subnet, a statistical histogram is built for each switch (hop num vs 
+number of occurrences). If the histogram reflects a specific column (higher
+than others) for a certain node, then it is marked as a root node. Since 
+the algorithm is statistical, it may not find any root nodes. The list of 
+the root nodes found by this auto-detect stage is used by the ranking 
+process stage.
+
+    Note 1: The user can override the node list manually.
+    Note 2: If this stage cannot find any root nodes, and the user did not 
+            specify a guid list file, OpenSM defaults back to the Min Hop 
+            routing algorithm.
+
+2.  Ranking process - All root switch nodes (found in stage 1) are assigned 
+a rank of 0. Using the BFS algorithm, the rest of the switch nodes in the 
+subnet are ranked incrementally. This ranking aids in the process of enforcing 
+rules that ensure loop-free paths.
+
+3.  Min Hop Table setting - after ranking is done, a BFS algorithm is run from 
+each (HCA or switch) node in the subnet. During the BFS process, the FDB table 
+of each switch node traversed by BFS is updated, in reference to the starting 
+node, based on the ranking rules and guid values.
+
+At the end of the process, the updated FDB tables ensure loop-free paths 
+through the subnet.
+
+
+UPDN Algorithm Usage
+
+Activation through OpenSM
+
+Use '-R updn' option (instead of old '-u') to activate the UPDN algorithm.
+Use `-a <guid_list_file>' for adding an UPDN guid file that contains the 
+root nodes for ranking.
+If the `-a' option is not used, OpenSM uses its auto-detect root nodes 
+algorithm.
+
+Notes on the guid list file:
+1.   A valid guid file specifies one guid in each line. Lines with an invalid 
+format will be discarded.
+2.   The user should specify the root switch guids. However, it is also 
+possible to specify HCA guids; OpenSM will use the guid of the switch (if 
+it exists) that connects the HCA to the subnet as a root node.
+
+
+To learn more about deadlock-free routing, see the article 
+"Deadlock Free Message Routing in Multiprocessor Interconnection Networks" 
+by William J Dally and Charles L Seitz (1985).
+







More information about the general mailing list