[ofa-general] [PATCH] opensm: Parallelize (Stripe) LFT sets across switches

Yevgeny Kliteynik kliteyn at dev.mellanox.co.il
Thu Jul 30 14:33:55 PDT 2009


I have more numbers, this time its on Qlogic switches, which do not 
handle DR packets forwarding in HW.

Fabric of ~1100 HCAs, ~280 switches.

Current OSM configures LFTs in ~2 seconds.
New algorithm does the same job in 1.4-1.6 seconds (30%-20% speed up),
depending on the max_smps_per_node value.

As in case of IS4 switches, the shortest config time was obtained with
max_smps_per_node=0, which is unlimited pipeline.

-- Yevgeny

Yevgeny Kliteynik wrote:
> Sasha,
> 
> Hal Rosenstock wrote:
>> Currently, MADs are pipelined to a single switch at a time which
>> effectively serializes these requests due to processing at the SMA.
>> This patch pipelines (stripes) them across the switches first before
>> proceeding with successive blocks. As a result of this striping,
>> multiple switches can process the set and respond concurrently
>> which results in an improvement to the subnet initialization time.
> 
> I have some numbers, and they look very good.
> 
> I have a small cluster of 17 IS4 switches and 11 HCAs.
> To artificially increase the cluster I used LMC=7, including
> ExtendedSwitchPort 0 LMC.
> 
> With the new code, LFT configuration is more than twice as
> fast as with the old code :)
> Current ucast manager ran on avarage for ~250msec, with the
> new code - 110-120msec.
> 
> Routing calculation phase of the ucast manager took ~1200 usec,
> the rest was sending the blocks and waiting for no more pending
> transactions.
> 
> I also didn't see any noticeble difference between various
> max_smps_per_node values.
> Here are some detailed results of different executions (the
> number on the left is timer value in usec):
> 
> Current ucast manager (w/o the optimization):
> 
> 000000 [LFT]: osm_ucast_mgr_process() - START
> 001131 [LFT]: ucast_mgr_process_tbl() - START
> 032251 [LFT]: ucast_mgr_process_tbl() - END
> 032263 [LFT]: osm_ucast_mgr_process() - END
> 253416 [LFT]: Done wait_for_pending_transactions()
> 
> New code, max_smps_per_node=0:
> 
> 001417 [LFT]: osm_ucast_mgr_process() - START (0 max_smps_per_node)
> 002690 [LFT]: ucast_mgr_process_tbl() - START
> 032946 [LFT]: ucast_mgr_process_tbl() - END
> 032948 [LFT]: osm_ucast_pipeline_tbl() - START
> 033846 [LFT]: osm_ucast_pipeline_tbl() - END
> 033858 [LFT]: osm_ucast_mgr_process() - END
> 108203 [LFT]: Done wait_for_pending_transactions()
> 
> New code, max_smps_per_node=1:
> 
> 007474 [LFT]: osm_ucast_mgr_process() - START (1 max_smps_per_node)
> 008735 [LFT]: ucast_mgr_process_tbl() - START
> 040071 [LFT]: ucast_mgr_process_tbl() - END
> 040074 [LFT]: osm_ucast_pipeline_tbl() - START
> 040103 [LFT]: osm_ucast_pipeline_tbl() - END
> 040114 [LFT]: osm_ucast_mgr_process() - END
> 120097 [LFT]: Done wait_for_pending_transactions()
> 
> New code, max_smps_per_node=4:
> 
> 004137 [LFT]: osm_ucast_mgr_process() - START (4 max_smps_per_node)
> 005380 [LFT]: ucast_mgr_process_tbl() - START
> 037436 [LFT]: ucast_mgr_process_tbl() - END
> 037439 [LFT]: osm_ucast_pipeline_tbl() - START
> 037495 [LFT]: osm_ucast_pipeline_tbl() - END
> 037506 [LFT]: osm_ucast_mgr_process() - END
> 114983 [LFT]: Done wait_for_pending_transactions()
> 
> Here's the patch that shows where I added the log messages in current code:
> 
>> From d7962cc06f5526982bc51c17d53981b6d23256a8 Mon Sep 17 00:00:00 2001
> From: Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>
> Date: Mon, 27 Jul 2009 16:12:27 +0300
> Subject: [PATCH] [LFT - master] log messages for LFT parallelization
> 
> Signed-off-by: Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>
> ---
> opensm/opensm/osm_state_mgr.c |    9 ++++++++-
> opensm/opensm/osm_ucast_mgr.c |    4 ++++
> 2 files changed, 12 insertions(+), 1 deletions(-)
> 
> diff --git a/opensm/opensm/osm_state_mgr.c b/opensm/opensm/osm_state_mgr.c
> index adc39a0..573a2a7 100644
> --- a/opensm/opensm/osm_state_mgr.c
> +++ b/opensm/opensm/osm_state_mgr.c
> @@ -1271,11 +1271,18 @@ _repeat_discovery:
>      */
> 
>     if (!sm->ucast_mgr.cache_valid ||
> -        osm_ucast_cache_process(&sm->ucast_mgr))
> +        osm_ucast_cache_process(&sm->ucast_mgr)) {
> +        OSM_LOG(sm->p_log, OSM_LOG_INFO, "[LFT]: "
> +            "osm_ucast_mgr_process() - START\n");
>         osm_ucast_mgr_process(&sm->ucast_mgr);
> +        OSM_LOG(sm->p_log, OSM_LOG_INFO, "[LFT]: "
> +            "osm_ucast_mgr_process() - END\n");
> +    }
> 
>     if (wait_for_pending_transactions(&sm->p_subn->p_osm->stats))
>         return;
> +    OSM_LOG(sm->p_log, OSM_LOG_INFO, "[LFT]: "
> +        "Done wait_for_pending_transactions()\n");
> 
>     /* cleanup switch lft buffers */
>     cl_qmap_apply_func(&sm->p_subn->sw_guid_tbl, cleanup_switch, 
> sm->p_log);
> diff --git a/opensm/opensm/osm_ucast_mgr.c b/opensm/opensm/osm_ucast_mgr.c
> index 78a7031..83867a0 100644
> --- a/opensm/opensm/osm_ucast_mgr.c
> +++ b/opensm/opensm/osm_ucast_mgr.c
> @@ -901,8 +901,12 @@ static int ucast_mgr_build_lfts(osm_ucast_mgr_t * 
> p_mgr)
>     cl_qmap_apply_func(&p_mgr->p_subn->port_guid_tbl,
>                add_port_to_order_list, p_mgr);
> 
> +    OSM_LOG(p_mgr->p_log, OSM_LOG_INFO, "[LFT]: "
> +        "ucast_mgr_process_tbl() - START\n");
>     cl_qmap_apply_func(&p_mgr->p_subn->sw_guid_tbl, ucast_mgr_process_tbl,
>                p_mgr);
> +    OSM_LOG(p_mgr->p_log, OSM_LOG_INFO, "[LFT]: "
> +        "ucast_mgr_process_tbl() - END\n");
> 
>     cl_qlist_remove_all(&p_mgr->port_order_list);
> 




More information about the general mailing list