[ofa-general] Toward next OFED release (1.3)

Yevgeny Kliteynik kliteyn at mellanox.co.il
Wed Jul 11 12:56:22 PDT 2007


Hal Rosenstock wrote:
> On Tue, 2007-07-10 at 15:12, Hal Rosenstock wrote:
>   
>> On Tue, 2007-07-10 at 15:11, Peter Kjellstrom wrote:
>>     
>>> On Tuesday 10 July 2007, Hal Rosenstock wrote:
>>> ...
>>>       
>>>>> Management:
>>>>>       * Multiple partitions
>>>>>       * OpenSM
>>>>>               * More routing performance improvements
>>>>>               * Even more speedups
>>>>>               * Better packaging/installation
>>>>>               * “Native” daemon mode
>>>>>               * Performance management
>>>>>               * Quality of Service manager: Based on IBTA annex
>>>>>           
>>>> enhancements for fat tree routing (non pure tree support)
>>>> more console commands and telnet access to console
>>>>         
>>> Pardon my ignorance, but could you elaborate on what a "non-pure tree" is and 
>>> in which way OFED-1.2 opensm performs badly for these?
>>>       
>
> The following patch contains some of the answers to the above:
>   
Hi guys. Sorry for the delay.

Anyway, the patch does answer the question, but I'll add my two cents 
anyway.

The fat-tree algorithm optimizes routing for "shift" communication pattern.
Before the latest change, the topology that the fat-tree routing engine 
could
handle had to be a pure fat-tree, and by "pure" I mean completely 
symmetrical
tree that complies with the following rules:
  - Switches of the same rank should have the same number
    of UP-going port groups*, unless they are root switches,
    in which case the shouldn't have UP-going ports at all.
  - Switches of the same rank should have the same number
    of DOWN-going port groups, unless they are leaf switches.
  - Switches of the same rank should have the same number
    of ports in each UP-going port group.
  - Switches of the same rank should have the same number
    of ports in each DOWN-going port group.
  - *All* the CAs have to be at the same tree level (rank),
    doesn't matter if they are compute nodes or management nodes.

Any other topology will cause fat-tree routing to fail and OpenSM
would fall back to default routing. Note that this also means that
in a symmetrical fat-tree any link failure (except for the links
between CAs and leaf switches) will break the fabric symmetry and
the routing will fall back to default.

With the recent changes, the user can supply list of roots and
compute node guids, and then fat-tree routing is able to handle
trees that are not symmetrical, and the topology has to comply
with this (very) reduced set of constraints:
  - All the Compute Nodes have to be at the same tree level (rank).
    Note that non-compute node CAs are allowed here to be at different
    tree ranks.

But of course, the less the tree is symmetrical, the worse the routing
results will be.

-- Yevgeny

> -----Forwarded Message-----
>
> From: Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>
> To: Hal Rosenstock <halr at voltaire.com>
> Cc: OpenIB <general at lists.openfabrics.org>
> Subject: [PATCH 2/2] osm: updating doc with root and compute nodes options for fat-tree
> Date: 09 Jul 2007 11:32:49 +0300
>
> Hi Hal.
>
> Updating doc and osm manpage with the 
> recent enhancement of fat-tree routing.
>
> Signed-off-by: Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>
> ---
>  opensm/doc/current-routing.txt |   28 ++++++++++++++++++++++------
>  opensm/man/opensm.8            |   33 ++++++++++++++++++++++++++-------
>  2 files changed, 48 insertions(+), 13 deletions(-)
>
> diff --git a/opensm/doc/current-routing.txt b/opensm/doc/current-routing.txt
> index 9852ef0..76f91ba 100644
> --- a/opensm/doc/current-routing.txt
> +++ b/opensm/doc/current-routing.txt
> @@ -174,11 +174,14 @@ Fat-tree Routing Algorithm
>  Purpose:
>  
>  The fat-tree algorithm optimizes routing for "shift" communication pattern.
> -It should be chosen if a subnet is a symmetrical fat-tree of various types.
> +It should be chosen if a subnet is a symmetrical or almost symmetrical
> +fat-tree of various types.
>  It supports not just K-ary-N-Trees, by handling for non-constant K,
>  cases where not all leafs (CAs) are present, any CBB ratio.
>  As in UPDN, fat-tree also prevents credit-loop-deadlocks.
> -Fat-tree algorithm supports topologies that comply with the following rules:
> +
> +If the root guid file is not provided ('-a' or '--root_guid_file' options),
> +the topology has to be pure fat-tree that complies with the following rules:
>    - Tree rank should be between two and eight (inclusively)
>    - Switches of the same rank should have the same number
>      of UP-going port groups*, unless they are root switches,
> @@ -189,18 +192,31 @@ Fat-tree algorithm supports topologies that comply with the following rules:
>      of ports in each UP-going port group.
>    - Switches of the same rank should have the same number
>      of ports in each DOWN-going port group.
> -*ports that are connected to the same remote switch are referenced as
> +  - All the CAs have to be at the same tree level (rank).
> +
> +If the root guid file is provided, the topology doesn't have to be pure
> +fat-tree, and it should only comply with the following rules:
> +  - Tree rank should be between two and eight (inclusively)
> +  - All the Compute Nodes** have to be at the same tree level (rank).
> +    Note that non-compute node CAs are allowed here to be at different
> +    tree ranks.
> +
> +* ports that are connected to the same remote switch are referenced as
>  'port group'.
> +** list of compute nodes (CNs) can be specified by '-u' or '--cn_guid_file'
> +OpenSM options.
>  
>  Note that although fat-tree algorithm supports trees with non-integer CBB
>  ratio, the routing will not be as balanced as in case of integer CBB ratio.
>  In addition to this, although the algorithm allows leaf switches to have any
>  number of CAs, the closer the tree is to be fully populated, the more effective
>  the "shift" communication pattern will be.
> +In general, even if the root list is provided, the closer the topology to a
> +pure and symmetrical fat-tree, the more optimal the routing will be.
>  
> -The algorithm also dumps CA ordering file (opensm-ftree-ca-order.dump) in the
> -same directory where the OpenSM log resides. This ordering file provides the
> -CA order that may be used to create efficient communication pattern, that
> +The algorithm also dumps compute node ordering file (opensm-ftree-ca-order.dump)
> +in the same directory where the OpenSM log resides. This ordering file provides
> +the CN order that may be used to create efficient communication pattern, that
>  will match the routing tables.
>  
>
> diff --git a/opensm/man/opensm.8 b/opensm/man/opensm.8
> index 5f34cd1..5472faf 100644
> --- a/opensm/man/opensm.8
> +++ b/opensm/man/opensm.8
> @@ -603,7 +603,7 @@ UPDN Algorithm Usage
>  Activation through OpenSM
>  
>  Use '-R updn' option (instead of old '-u') to activate the UPDN algorithm.
> -Use '-a <guid_list_file>' for adding an UPDN guid file that contains the
> +Use '-a <root_guid_file>' for adding an UPDN guid file that contains the
>  root nodes for ranking.
>  If the `-a' option is not used, OpenSM uses its auto-detect root nodes
>  algorithm.
> @@ -621,12 +621,14 @@ it exists) that connects the CA to the subnet as a root node.
>  Fat-tree Routing Algorithm
>  
>  The fat-tree algorithm optimizes routing for "shift" communication pattern.
> -It should be chosen if a subnet is a symmetrical fat-tree of various types.
> +It should be chosen if a subnet is a symmetrical or almost symmetrical
> +fat-tree of various types.
>  It supports not just K-ary-N-Trees, by handling for non-constant K,
>  cases where not all leafs (CAs) are present, any CBB ratio.
>  As in UPDN, fat-tree also prevents credit-loop-deadlocks.
>  
> -The Fat-tree algorithm supports topologies that comply with the following rules:
> +If the root guid file is not provided ('-a' or '--root_guid_file' options),
> +the topology has to be pure fat-tree that complies with the following rules:
>    - Tree rank should be between two and eight (inclusively)
>    - Switches of the same rank should have the same number
>      of UP-going port groups*, unless they are root switches,
> @@ -637,10 +639,21 @@ The Fat-tree algorithm supports topologies that comply with the following rules:
>      of ports in each UP-going port group.
>    - Switches of the same rank should have the same number
>      of ports in each DOWN-going port group.
> +  - All the CAs have to be at the same tree level (rank).
>  
> -Note: ports that are connected to the same remote switch are referenced as
> +If the root guid file is provided, the topology doesn't have to be pure
> +fat-tree, and it should only comply with the following rules:
> +  - Tree rank should be between two and eight (inclusively)
> +  - All the Compute Nodes** have to be at the same tree level (rank).
> +    Note that non-compute node CAs are allowed here to be at different
> +    tree ranks.
> +
> +* ports that are connected to the same remote switch are referenced as
>  \'port group\'.
>  
> +** list of compute nodes (CNs) can be specified by \'-u\' or \'--cn_guid_file\'
> +OpenSM options.
> +
>  Topologies that do not comply cause a fallback to min hop routing.
>  Note that this can also occur on link failures which cause the topology
>  to no longer be "pure" fat-tree.
> @@ -650,15 +663,21 @@ ratio, the routing will not be as balanced as in case of integer CBB ratio.
>  In addition to this, although the algorithm allows leaf switches to have any
>  number of CAs, the closer the tree is to be fully populated, the more
>  effective the "shift" communication pattern will be.
> +In general, even if the root list is provided, the closer the topology to a
> +pure and symmetrical fat-tree, the more optimal the routing will be.
>  
> -The algorithm also dumps CA ordering file (opensm-ftree-ca-order.dump) in the
> -same directory where the OpenSM log resides. This ordering file provides the
> -CA order that may be used to create efficient communication pattern, that
> +The algorithm also dumps compute node ordering file (opensm-ftree-ca-order.dump)
> +in the same directory where the OpenSM log resides. This ordering file provides
> +the CN order that may be used to create efficient communication pattern, that
>  will match the routing tables.
>  
>  Activation through OpenSM
>  
>  Use '-R ftree' option to activate the fat-tree algorithm.
> +Use '-a <root_guid_file>' to provide root nodes for ranking. If the `-a' option
> +is not used, routing algorithm will detect roots automatically.
> +Use '-u <root_cn_file>' to provide the list of compute nodes. If the `-u' option
> +is not used, all the CAs are considered as compute nodes.
>  
>  Note: LMC > 0 is not supported by fat-tree routing. If this is
>  specified, the default routing algorithm is invoked instead.
>   



More information about the general mailing list