[ofa-general] [PATCH] opensm: Parallelize (Stripe) LFT sets across switches

Yevgeny Kliteynik kliteyn at dev.mellanox.co.il
Tue Jul 28 07:39:19 PDT 2009


Sasha,

Hal Rosenstock wrote:
> Currently, MADs are pipelined to a single switch at a time which
> effectively serializes these requests due to processing at the SMA.
> This patch pipelines (stripes) them across the switches first before
> proceeding with successive blocks. As a result of this striping,
> multiple switches can process the set and respond concurrently
> which results in an improvement to the subnet initialization time.

I have some numbers, and they look very good.
 
I have a small cluster of 17 IS4 switches and 11 HCAs.
To artificially increase the cluster I used LMC=7, including
ExtendedSwitchPort 0 LMC.
 
With the new code, LFT configuration is more than twice as
fast as with the old code :)
Current ucast manager ran on avarage for ~250msec, with the
new code - 110-120msec.
 
Routing calculation phase of the ucast manager took ~1200 usec,
the rest was sending the blocks and waiting for no more pending
transactions.
 
I also didn't see any noticeble difference between various
max_smps_per_node values. 

Here are some detailed results of different executions (the
number on the left is timer value in usec):
 
Current ucast manager (w/o the optimization):
 
000000 [LFT]: osm_ucast_mgr_process() - START
001131 [LFT]: ucast_mgr_process_tbl() - START
032251 [LFT]: ucast_mgr_process_tbl() - END
032263 [LFT]: osm_ucast_mgr_process() - END
253416 [LFT]: Done wait_for_pending_transactions()
 
New code, max_smps_per_node=0:
 
001417 [LFT]: osm_ucast_mgr_process() - START (0 max_smps_per_node)
002690 [LFT]: ucast_mgr_process_tbl() - START
032946 [LFT]: ucast_mgr_process_tbl() - END
032948 [LFT]: osm_ucast_pipeline_tbl() - START
033846 [LFT]: osm_ucast_pipeline_tbl() - END
033858 [LFT]: osm_ucast_mgr_process() - END
108203 [LFT]: Done wait_for_pending_transactions()
 
New code, max_smps_per_node=1:

007474 [LFT]: osm_ucast_mgr_process() - START (1 max_smps_per_node)
008735 [LFT]: ucast_mgr_process_tbl() - START
040071 [LFT]: ucast_mgr_process_tbl() - END
040074 [LFT]: osm_ucast_pipeline_tbl() - START
040103 [LFT]: osm_ucast_pipeline_tbl() - END
040114 [LFT]: osm_ucast_mgr_process() - END
120097 [LFT]: Done wait_for_pending_transactions()
 
New code, max_smps_per_node=4:

004137 [LFT]: osm_ucast_mgr_process() - START (4 max_smps_per_node)
005380 [LFT]: ucast_mgr_process_tbl() - START
037436 [LFT]: ucast_mgr_process_tbl() - END
037439 [LFT]: osm_ucast_pipeline_tbl() - START
037495 [LFT]: osm_ucast_pipeline_tbl() - END
037506 [LFT]: osm_ucast_mgr_process() - END
114983 [LFT]: Done wait_for_pending_transactions()

Here's the patch that shows where I added the log messages in current code:

>From d7962cc06f5526982bc51c17d53981b6d23256a8 Mon Sep 17 00:00:00 2001
From: Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>
Date: Mon, 27 Jul 2009 16:12:27 +0300
Subject: [PATCH] [LFT - master] log messages for LFT parallelization

Signed-off-by: Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>
---
 opensm/opensm/osm_state_mgr.c |    9 ++++++++-
 opensm/opensm/osm_ucast_mgr.c |    4 ++++
 2 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/opensm/opensm/osm_state_mgr.c b/opensm/opensm/osm_state_mgr.c
index adc39a0..573a2a7 100644
--- a/opensm/opensm/osm_state_mgr.c
+++ b/opensm/opensm/osm_state_mgr.c
@@ -1271,11 +1271,18 @@ _repeat_discovery:
 	 */
 
 	if (!sm->ucast_mgr.cache_valid ||
-	    osm_ucast_cache_process(&sm->ucast_mgr))
+	    osm_ucast_cache_process(&sm->ucast_mgr)) {
+		OSM_LOG(sm->p_log, OSM_LOG_INFO, "[LFT]: "
+			"osm_ucast_mgr_process() - START\n");
 		osm_ucast_mgr_process(&sm->ucast_mgr);
+		OSM_LOG(sm->p_log, OSM_LOG_INFO, "[LFT]: "
+			"osm_ucast_mgr_process() - END\n");
+	}
 
 	if (wait_for_pending_transactions(&sm->p_subn->p_osm->stats))
 		return;
+	OSM_LOG(sm->p_log, OSM_LOG_INFO, "[LFT]: "
+		"Done wait_for_pending_transactions()\n");
 
 	/* cleanup switch lft buffers */
 	cl_qmap_apply_func(&sm->p_subn->sw_guid_tbl, cleanup_switch, sm->p_log);
diff --git a/opensm/opensm/osm_ucast_mgr.c b/opensm/opensm/osm_ucast_mgr.c
index 78a7031..83867a0 100644
--- a/opensm/opensm/osm_ucast_mgr.c
+++ b/opensm/opensm/osm_ucast_mgr.c
@@ -901,8 +901,12 @@ static int ucast_mgr_build_lfts(osm_ucast_mgr_t * p_mgr)
 	cl_qmap_apply_func(&p_mgr->p_subn->port_guid_tbl,
 			   add_port_to_order_list, p_mgr);
 
+	OSM_LOG(p_mgr->p_log, OSM_LOG_INFO, "[LFT]: "
+		"ucast_mgr_process_tbl() - START\n");
 	cl_qmap_apply_func(&p_mgr->p_subn->sw_guid_tbl, ucast_mgr_process_tbl,
 			   p_mgr);
+	OSM_LOG(p_mgr->p_log, OSM_LOG_INFO, "[LFT]: "
+		"ucast_mgr_process_tbl() - END\n");
 
 	cl_qlist_remove_all(&p_mgr->port_order_list);
 
-- 
1.5.1.4

 
And here's the patch that shows where I added the log messages in Hal's code: 


>From 36476d885ca0f295c1fa46b4e467c9c883a2173b Mon Sep 17 00:00:00 2001
From: Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>
Date: Mon, 27 Jul 2009 15:35:16 +0300
Subject: [PATCH] [LFT] printfs for LFT parallelization in Hal's code

Signed-off-by: Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>
---
 opensm/opensm/osm_state_mgr.c |   11 ++++++++++-
 opensm/opensm/osm_ucast_mgr.c |    8 ++++++++
 2 files changed, 18 insertions(+), 1 deletions(-)

diff --git a/opensm/opensm/osm_state_mgr.c b/opensm/opensm/osm_state_mgr.c
index 847941f..1a6d1ea 100644
--- a/opensm/opensm/osm_state_mgr.c
+++ b/opensm/opensm/osm_state_mgr.c
@@ -1272,11 +1272,20 @@ _repeat_discovery:
 	 */
 
 	if (!sm->ucast_mgr.cache_valid ||
-	    osm_ucast_cache_process(&sm->ucast_mgr))
+	    osm_ucast_cache_process(&sm->ucast_mgr)) {
+		OSM_LOG(sm->p_log, OSM_LOG_INFO, "[LFT]: "
+			"osm_ucast_mgr_process() - START "
+			"(%d max_smps_per_node)\n",
+			sm->ucast_mgr.p_subn->opt.max_smps_per_node);
 		osm_ucast_mgr_process(&sm->ucast_mgr);
+		OSM_LOG(sm->p_log, OSM_LOG_INFO, "[LFT]: "
+			"osm_ucast_mgr_process() - END\n");
+	}
 
 	if (wait_for_pending_transactions(&sm->p_subn->p_osm->stats))
 		return;
+	OSM_LOG(sm->p_log, OSM_LOG_INFO, "[LFT]: "
+		"Done wait_for_pending_transactions()\n");
 
 	/* cleanup switch lft buffers */
 	cl_qmap_apply_func(&sm->p_subn->sw_guid_tbl, cleanup_switch, sm->p_log);
diff --git a/opensm/opensm/osm_ucast_mgr.c b/opensm/opensm/osm_ucast_mgr.c
index 1507bed..d05c05e 100644
--- a/opensm/opensm/osm_ucast_mgr.c
+++ b/opensm/opensm/osm_ucast_mgr.c
@@ -932,10 +932,18 @@ static int ucast_mgr_build_lfts(osm_ucast_mgr_t * p_mgr)
 	cl_qmap_apply_func(&p_mgr->p_subn->port_guid_tbl,
 			   add_port_to_order_list, p_mgr);
 
+	OSM_LOG(p_mgr->p_log, OSM_LOG_INFO, "[LFT]: "
+		"ucast_mgr_process_tbl() - START\n");
 	cl_qmap_apply_func(&p_mgr->p_subn->sw_guid_tbl, ucast_mgr_process_tbl,
 			   p_mgr);
+	OSM_LOG(p_mgr->p_log, OSM_LOG_INFO, "[LFT]: "
+		"ucast_mgr_process_tbl() - END\n");
 
+	OSM_LOG(p_mgr->p_log, OSM_LOG_INFO, "[LFT]: "
+		"osm_ucast_pipeline_tbl() - START\n");
 	osm_ucast_pipeline_tbl(p_mgr);
+	OSM_LOG(p_mgr->p_log, OSM_LOG_INFO, "[LFT]: "
+		"osm_ucast_pipeline_tbl() - END\n");
 
 	cl_qlist_remove_all(&p_mgr->port_order_list);
 
-- 
1.5.1.4




More information about the general mailing list