[ofa-general] [PATCH] opensm: Parallelize (Stripe) LFT sets across switches
Yevgeny Kliteynik
kliteyn at dev.mellanox.co.il
Thu Jul 30 14:33:55 PDT 2009
I have more numbers, this time its on Qlogic switches, which do not
handle DR packets forwarding in HW.
Fabric of ~1100 HCAs, ~280 switches.
Current OSM configures LFTs in ~2 seconds.
New algorithm does the same job in 1.4-1.6 seconds (30%-20% speed up),
depending on the max_smps_per_node value.
As in case of IS4 switches, the shortest config time was obtained with
max_smps_per_node=0, which is unlimited pipeline.
-- Yevgeny
Yevgeny Kliteynik wrote:
> Sasha,
>
> Hal Rosenstock wrote:
>> Currently, MADs are pipelined to a single switch at a time which
>> effectively serializes these requests due to processing at the SMA.
>> This patch pipelines (stripes) them across the switches first before
>> proceeding with successive blocks. As a result of this striping,
>> multiple switches can process the set and respond concurrently
>> which results in an improvement to the subnet initialization time.
>
> I have some numbers, and they look very good.
>
> I have a small cluster of 17 IS4 switches and 11 HCAs.
> To artificially increase the cluster I used LMC=7, including
> ExtendedSwitchPort 0 LMC.
>
> With the new code, LFT configuration is more than twice as
> fast as with the old code :)
> Current ucast manager ran on avarage for ~250msec, with the
> new code - 110-120msec.
>
> Routing calculation phase of the ucast manager took ~1200 usec,
> the rest was sending the blocks and waiting for no more pending
> transactions.
>
> I also didn't see any noticeble difference between various
> max_smps_per_node values.
> Here are some detailed results of different executions (the
> number on the left is timer value in usec):
>
> Current ucast manager (w/o the optimization):
>
> 000000 [LFT]: osm_ucast_mgr_process() - START
> 001131 [LFT]: ucast_mgr_process_tbl() - START
> 032251 [LFT]: ucast_mgr_process_tbl() - END
> 032263 [LFT]: osm_ucast_mgr_process() - END
> 253416 [LFT]: Done wait_for_pending_transactions()
>
> New code, max_smps_per_node=0:
>
> 001417 [LFT]: osm_ucast_mgr_process() - START (0 max_smps_per_node)
> 002690 [LFT]: ucast_mgr_process_tbl() - START
> 032946 [LFT]: ucast_mgr_process_tbl() - END
> 032948 [LFT]: osm_ucast_pipeline_tbl() - START
> 033846 [LFT]: osm_ucast_pipeline_tbl() - END
> 033858 [LFT]: osm_ucast_mgr_process() - END
> 108203 [LFT]: Done wait_for_pending_transactions()
>
> New code, max_smps_per_node=1:
>
> 007474 [LFT]: osm_ucast_mgr_process() - START (1 max_smps_per_node)
> 008735 [LFT]: ucast_mgr_process_tbl() - START
> 040071 [LFT]: ucast_mgr_process_tbl() - END
> 040074 [LFT]: osm_ucast_pipeline_tbl() - START
> 040103 [LFT]: osm_ucast_pipeline_tbl() - END
> 040114 [LFT]: osm_ucast_mgr_process() - END
> 120097 [LFT]: Done wait_for_pending_transactions()
>
> New code, max_smps_per_node=4:
>
> 004137 [LFT]: osm_ucast_mgr_process() - START (4 max_smps_per_node)
> 005380 [LFT]: ucast_mgr_process_tbl() - START
> 037436 [LFT]: ucast_mgr_process_tbl() - END
> 037439 [LFT]: osm_ucast_pipeline_tbl() - START
> 037495 [LFT]: osm_ucast_pipeline_tbl() - END
> 037506 [LFT]: osm_ucast_mgr_process() - END
> 114983 [LFT]: Done wait_for_pending_transactions()
>
> Here's the patch that shows where I added the log messages in current code:
>
>> From d7962cc06f5526982bc51c17d53981b6d23256a8 Mon Sep 17 00:00:00 2001
> From: Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>
> Date: Mon, 27 Jul 2009 16:12:27 +0300
> Subject: [PATCH] [LFT - master] log messages for LFT parallelization
>
> Signed-off-by: Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>
> ---
> opensm/opensm/osm_state_mgr.c | 9 ++++++++-
> opensm/opensm/osm_ucast_mgr.c | 4 ++++
> 2 files changed, 12 insertions(+), 1 deletions(-)
>
> diff --git a/opensm/opensm/osm_state_mgr.c b/opensm/opensm/osm_state_mgr.c
> index adc39a0..573a2a7 100644
> --- a/opensm/opensm/osm_state_mgr.c
> +++ b/opensm/opensm/osm_state_mgr.c
> @@ -1271,11 +1271,18 @@ _repeat_discovery:
> */
>
> if (!sm->ucast_mgr.cache_valid ||
> - osm_ucast_cache_process(&sm->ucast_mgr))
> + osm_ucast_cache_process(&sm->ucast_mgr)) {
> + OSM_LOG(sm->p_log, OSM_LOG_INFO, "[LFT]: "
> + "osm_ucast_mgr_process() - START\n");
> osm_ucast_mgr_process(&sm->ucast_mgr);
> + OSM_LOG(sm->p_log, OSM_LOG_INFO, "[LFT]: "
> + "osm_ucast_mgr_process() - END\n");
> + }
>
> if (wait_for_pending_transactions(&sm->p_subn->p_osm->stats))
> return;
> + OSM_LOG(sm->p_log, OSM_LOG_INFO, "[LFT]: "
> + "Done wait_for_pending_transactions()\n");
>
> /* cleanup switch lft buffers */
> cl_qmap_apply_func(&sm->p_subn->sw_guid_tbl, cleanup_switch,
> sm->p_log);
> diff --git a/opensm/opensm/osm_ucast_mgr.c b/opensm/opensm/osm_ucast_mgr.c
> index 78a7031..83867a0 100644
> --- a/opensm/opensm/osm_ucast_mgr.c
> +++ b/opensm/opensm/osm_ucast_mgr.c
> @@ -901,8 +901,12 @@ static int ucast_mgr_build_lfts(osm_ucast_mgr_t *
> p_mgr)
> cl_qmap_apply_func(&p_mgr->p_subn->port_guid_tbl,
> add_port_to_order_list, p_mgr);
>
> + OSM_LOG(p_mgr->p_log, OSM_LOG_INFO, "[LFT]: "
> + "ucast_mgr_process_tbl() - START\n");
> cl_qmap_apply_func(&p_mgr->p_subn->sw_guid_tbl, ucast_mgr_process_tbl,
> p_mgr);
> + OSM_LOG(p_mgr->p_log, OSM_LOG_INFO, "[LFT]: "
> + "ucast_mgr_process_tbl() - END\n");
>
> cl_qlist_remove_all(&p_mgr->port_order_list);
>
More information about the general
mailing list