[ofa-general] [PATCH] opensm: DOR (Dimension Order Routing) routing engine

Sasha Khapyorsky sashak at voltaire.com
Mon Oct 22 03:30:00 PDT 2007


From: Dale Purdy <purdy at sgi.com>

The Dimension Order Routing algorithm is based on the Min Hop
algorithm and so uses shortest paths.  Instead of spreading traffic
out across different paths with the same shortest distance, it chooses
among the available shortest paths based on an ordering of dimensions.
Each port must be consistently cabled to represent a hypercube
dimension or a mesh dimension.  Paths are grown from a destination
back to a source using the lowest dimension (port) of available paths
at each step.  This provides the ordering necessary to avoid deadlock.
When there are multiple links between any two switches, they still
represent only one dimension and traffic is balanced across them
unless port equalization is turned off.  In the case of hypercubes,
the same port must be used throughout the fabric to represent the
hypercube dimension and match on both ends of the cable.  In the case
of meshes, the dimension should consistently use the same pair of
ports, one port on one end of the cable, and the other port on the
other end, continuing along the mesh dimension.

Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>
---
 opensm/include/opensm/osm_switch.h |    4 ++++
 opensm/man/opensm.8                |   32 ++++++++++++++++++++++++++++++--
 opensm/opensm/main.c               |    2 +-
 opensm/opensm/osm_dump.c           |   11 ++++++++++-
 opensm/opensm/osm_opensm.c         |    1 +
 opensm/opensm/osm_switch.c         |   15 +++++++++++++++
 opensm/opensm/osm_ucast_mgr.c      |   10 ++++++++--
 7 files changed, 69 insertions(+), 6 deletions(-)

diff --git a/opensm/include/opensm/osm_switch.h b/opensm/include/opensm/osm_switch.h
index e294527..e2fe86d 100644
--- a/opensm/include/opensm/osm_switch.h
+++ b/opensm/include/opensm/osm_switch.h
@@ -958,6 +958,7 @@ osm_switch_recommend_path(IN const osm_switch_t * const p_sw,
 			  IN osm_port_t * p_port,
 			  IN const uint16_t lid_ho,
 			  IN const boolean_t ignore_existing,
+			  IN const boolean_t dor,
 			  IN OUT uint64_t * remote_sys_guids,
 			  IN OUT uint16_t * p_num_used_sys,
 			  IN OUT uint64_t * remote_node_guids,
@@ -980,6 +981,9 @@ osm_switch_recommend_path(IN const osm_switch_t * const p_sw,
 *		If false, the switch will choose an existing route if one
 *		exists, otherwise will choose the optimal route.
 *
+*	dor
+*		[in] If TRUE, Dimension Order Routing will be done.
+*
 *	remote_sys_guids
 *		[in out] The array of remote system guids already used to
 *		route the other lids of the same target port (if LMC > 0).
diff --git a/opensm/man/opensm.8 b/opensm/man/opensm.8
index a0178df..2bdea8e 100644
--- a/opensm/man/opensm.8
+++ b/opensm/man/opensm.8
@@ -92,7 +92,7 @@ LID assignments resolving multiple use of same LID.
 \fB\-R\fR, \fB\-\-routing_engine\fR
 This option chooses routing engine instead of Min Hop
 algorithm (default).
-Supported engines: updn, file, ftree, lash
+Supported engines: updn, file, ftree, lash, dor
 .TP
 \fB\-z\fR, \fB\-\-connect_roots\fR
 This option enforces a routing engine (currently up/down
@@ -452,7 +452,7 @@ Examples:
 
 .SH ROUTING
 .PP
-OpenSM now offers four routing engines:
+OpenSM now offers five routing engines:
 
 1.  Min Hop Algorithm - based on the minimum hops to each node where the
 path length is optimized.
@@ -474,6 +474,12 @@ distributing the paths between layers. LASH is an alternative
 deadlock-free topology-agnostic routing algorithm to the non-minimal
 UPDN algorithm avoiding the use of a potentially congested root node.
 
+5. DOR Unicast routing algorithm - based on the Min Hop algorithm, but
+avoids port equalization except for redundant links between the same
+two switches.  This provides deadlock free routes for hypercubes when
+the fabric is cabled as a hypercube and for meshes when cabled as a
+mesh (see details below).
+
 OpenSM also supports a file method which
 can load routes from a table. See \'Modular Routing Engine\' for more
 information on this.
@@ -742,6 +748,28 @@ Note: LMC > 0 is not supported by the LASH routing. If this is
 specified, the default routing algorithm is invoked instead.
 
 
+DOR Routing Algorithm
+
+The Dimension Order Routing algorithm is based on the Min Hop
+algorithm and so uses shortest paths.  Instead of spreading traffic
+out across different paths with the same shortest distance, it chooses
+among the available shortest paths based on an ordering of dimensions.
+Each port must be consistently cabled to represent a hypercube
+dimension or a mesh dimension.  Paths are grown from a destination
+back to a source using the lowest dimension (port) of available paths
+at each step.  This provides the ordering necessary to avoid deadlock.
+When there are multiple links between any two switches, they still
+represent only one dimension and traffic is balanced across them
+unless port equalization is turned off.  In the case of hypercubes,
+the same port must be used throughout the fabric to represent the
+hypercube dimension and match on both ends of the cable.  In the case
+of meshes, the dimension should consistently use the same pair of
+ports, one port on one end of the cable, and the other port on the
+other end, continuing along the mesh dimension.
+
+Use '-R dor' option to activate the DOR algorithm.
+
+
 Routing References
 
 To learn more about deadlock-free routing, see the article
diff --git a/opensm/opensm/main.c b/opensm/opensm/main.c
index 099a8d1..5771e9e 100644
--- a/opensm/opensm/main.c
+++ b/opensm/opensm/main.c
@@ -174,7 +174,7 @@ void show_usage(void)
 	       "--routing_engine <engine name>\n"
 	       "          This option chooses routing engine instead of Min Hop\n"
 	       "          algorithm (default).\n"
-	       "          Supported engines: updn, file, ftree, lash\n\n");
+	       "          Supported engines: updn, file, ftree, lash, dor\n\n");
 	printf("-z\n"
 	       "--connect_roots\n"
 	       "          This option enforces a routing engine (currently\n"
diff --git a/opensm/opensm/osm_dump.c b/opensm/opensm/osm_dump.c
index b7d99b2..fa07f83 100644
--- a/opensm/opensm/osm_dump.c
+++ b/opensm/opensm/osm_dump.c
@@ -136,6 +136,7 @@ static void dump_ucast_routes(cl_map_item_t * p_map_item, void *cxt)
 	uint16_t max_lid_ho;
 	uint16_t lid_ho, base_lid;
 	boolean_t direct_route_exists = FALSE;
+	boolean_t dor;
 	osm_switch_t *p_sw = (osm_switch_t *) p_map_item;
 	osm_opensm_t *p_osm = ((struct dump_context *)cxt)->p_osm;
 	FILE *file = ((struct dump_context *)cxt)->file;
@@ -148,6 +149,10 @@ static void dump_ucast_routes(cl_map_item_t * p_map_item, void *cxt)
 		"Switch 0x%016" PRIx64 "\n"
 		"LID    : Port : Hops : Optimal\n",
 		cl_ntoh64(osm_node_get_node_guid(p_node)));
+
+	dor = (p_osm->routing_engine.name &&
+	       (strcmp(p_osm->routing_engine.name, "dor") == 0));
+
 	for (lid_ho = 1; lid_ho <= max_lid_ho; lid_ho++) {
 		fprintf(file, "0x%04X : ", lid_ho);
 
@@ -228,7 +233,11 @@ static void dump_ucast_routes(cl_map_item_t * p_map_item, void *cxt)
 		if (best_hops == num_hops)
 			fprintf(file, "yes");
 		else {
-			best_port = osm_switch_recommend_path(p_sw, p_port, lid_ho, TRUE, NULL, NULL, NULL, NULL);	/* No LMC Optimization */
+			/* No LMC Optimization */
+			best_port = osm_switch_recommend_path(p_sw, p_port,
+							      lid_ho, TRUE, dor,
+							      NULL, NULL,
+							      NULL, NULL);
 			fprintf(file, "No %u hop path possible via port %u!",
 				best_hops, best_port);
 		}
diff --git a/opensm/opensm/osm_opensm.c b/opensm/opensm/osm_opensm.c
index 329305e..5b45401 100644
--- a/opensm/opensm/osm_opensm.c
+++ b/opensm/opensm/osm_opensm.c
@@ -81,6 +81,7 @@ const static struct routing_engine_module routing_modules[] = {
 	{"file", osm_ucast_file_setup},
 	{"ftree", osm_ucast_ftree_setup},
 	{"lash", osm_ucast_lash_setup},
+	{"dor", osm_ucast_null_setup },
 	{NULL, NULL}
 };
 
diff --git a/opensm/opensm/osm_switch.c b/opensm/opensm/osm_switch.c
index 5a636a2..bf686ad 100644
--- a/opensm/opensm/osm_switch.c
+++ b/opensm/opensm/osm_switch.c
@@ -224,6 +224,7 @@ osm_switch_recommend_path(IN const osm_switch_t * const p_sw,
 			  IN osm_port_t * p_port,
 			  IN const uint16_t lid_ho,
 			  IN const boolean_t ignore_existing,
+			  IN const boolean_t dor,
 			  IN OUT uint64_t * remote_sys_guids,
 			  IN OUT uint16_t * p_num_used_sys,
 			  IN OUT uint64_t * remote_node_guids,
@@ -267,6 +268,7 @@ osm_switch_recommend_path(IN const osm_switch_t * const p_sw,
 	osm_physp_t *p_physp;
 	osm_physp_t *p_rem_physp;
 	osm_node_t *p_rem_node;
+	osm_node_t *p_rem_node_first = NULL;
 
 	CL_ASSERT(lid_ho > 0);
 
@@ -430,6 +432,19 @@ osm_switch_recommend_path(IN const osm_switch_t * const p_sw,
 		   the count is min but also lower then the max subscribed
 		 */
 		if (check_count < least_paths) {
+			if (dor) {
+			    /* Get the Remote Node */
+			    p_rem_physp = osm_physp_get_remote(p_physp);
+			    p_rem_node = osm_physp_get_node_ptr(p_rem_physp);
+			    /* use the first dimension, but spread
+			     * traffic out among the group of ports
+			     * representing that dimension */
+			    if (port_found) {
+			        if (p_rem_node != p_rem_node_first)
+				    continue;
+			    } else
+			        p_rem_node_first = p_rem_node;
+			}
 			port_found = TRUE;
 			best_port = port_num;
 			least_paths = check_count;
diff --git a/opensm/opensm/osm_ucast_mgr.c b/opensm/opensm/osm_ucast_mgr.c
index 2a5fe88..43c2647 100644
--- a/opensm/opensm/osm_ucast_mgr.c
+++ b/opensm/opensm/osm_ucast_mgr.c
@@ -211,6 +211,8 @@ __osm_ucast_mgr_process_port(IN osm_ucast_mgr_t * const p_mgr,
 	uint8_t port;
 	boolean_t is_ignored_by_port_prof;
 	ib_net64_t node_guid;
+	struct osm_routing_engine *p_routing_eng;
+	boolean_t dor;
 	/*
 	   The following are temporary structures that will aid
 	   in providing better routing in LMC > 0 situations
@@ -274,6 +276,9 @@ __osm_ucast_mgr_process_port(IN osm_ucast_mgr_t * const p_mgr,
 
 	node_guid = osm_node_get_node_guid(p_sw->p_node);
 
+	p_routing_eng = &p_mgr->p_subn->p_osm->routing_engine;
+	dor = p_routing_eng->name && (strcmp(p_routing_eng->name, "dor") == 0);
+
 	/*
 	   The lid matrix contains the number of hops to each
 	   lid from each port.  From this information we determine
@@ -286,6 +291,7 @@ __osm_ucast_mgr_process_port(IN osm_ucast_mgr_t * const p_mgr,
 			port = osm_switch_recommend_path(p_sw, p_port, lid_ho,
 							 p_mgr->p_subn->
 							 ignore_existing_lfts,
+							 dor,
 							 remote_sys_guids,
 							 &num_used_sys,
 							 remote_node_guids,
@@ -294,6 +300,7 @@ __osm_ucast_mgr_process_port(IN osm_ucast_mgr_t * const p_mgr,
 			port = osm_switch_recommend_path(p_sw, p_port, lid_ho,
 							 p_mgr->p_subn->
 							 ignore_existing_lfts,
+							 dor,
 							 NULL, NULL, NULL,
 							 NULL);
 
@@ -306,8 +313,7 @@ __osm_ucast_mgr_process_port(IN osm_ucast_mgr_t * const p_mgr,
 
 			/* Up/Down routing can cause unreachable routes between some
 			   switches so we do not report that as an error in that case */
-			if (!p_mgr->p_subn->p_osm->routing_engine.
-			    build_lid_matrices) {
+			if (!p_routing_eng->build_lid_matrices) {
 				osm_log(p_mgr->p_log, OSM_LOG_ERROR,
 					"__osm_ucast_mgr_process_port: ERR 3A08: "
 					"No path to get to LID 0x%X from switch 0x%"
-- 
1.5.3.4.206.g58ba4




More information about the general mailing list