[ofa-general] Re: [PATCH v2] osm_ucast_ftree.c Allow horizontal links between switches of max rank

Line.Holen at Sun.COM Line.Holen at Sun.COM
Fri Jun 26 00:24:02 PDT 2009


On 06/23/09 03:53 PM, Nicolas Morey-Chaisemartin wrote:
> Hi,
> 
> As I discussed with Line, this patch also present a problem with topolgy using reverse hops and has a negative impact on routing quality.

OK, if thats still the case then you need to supply an example
showing this. Otherwise I have no way to verify and fix the
issues you are facing.
Without the reverse hops the results of the routing before and
after the horizontal support are identical.


> Moreover, I gave it some thoughts and I think there is a flaw in this approach.
> For example
> 
> 
>  L1-1    L1-2
>   |       | \
>   |       |  ----
>   |       |      \
>  _|___|___|_      L2
>  | QNEM    |     |\\
>  -----------     |\\\
>   ////  \\\\     |\\\\
> N[1-12] N[13-24] M[1-18]
> 
> In this example the horizontal link is at the last level of the tree so it is allowed by Line's patch.
> As horizontal port are treated as downlinks from both sides they can only be used when creating upwards routes (algorithm going down). 
> So in this example nodes N 13 to 24 will have full connectivity to nodes M 1 to 18 and nodes N 1 to 12 will have a route to nodes M 1 to 18.
> However nodes M 1 to 18 won't have any routes to nodes N 1 to 12 because the algorithm starting from node N1, from example,
> cannot reach switch L1-2 as the horizontal link is seen as a down link. With reverse hop it should be possible to use one horizontal link and go one but reverse hop are not well balanced and should not be used for compute nodes.
> 

Nicolas, you are right. In this example you would not have
full connectivity, N1-12 would not reach M1-18 because that
would represent a down-up-down path.
Your example represents a highly degraded system with no 
common core switch for L2 and the leftmost switch in the QNEM.
In this case, full connectivity could be reached with alternative
routing algorithms but that should be an administrative decision.


> What I think should be done is have a more generalized approached. The Ftree algorithm already manages an enum with direction. We could add a 3rd direction (horizontal) and treat links as both upward and downwards. With the patch I've pushed (the one conflicting), we already ensure minimum hop path is used and secondary routes are more balanced so it should work quite nicely.
> 

Handling the horizontal links as both up and down would give
full connectivity in your example. But there are other examples
that would introduce deadlock situations with your approach.


Level 1    x-1          y-1          
            | \        /  |
            |  \      /   |
Level 2    x-2 x-3  y-2  y-3
            |    \  /     |
            |     \/      |
            |     /\      |
            |    /  \     |
leafs       A -- B   C -- D

Clarifications:
 x1-3 is a subset of switches in a core ftree
 y1-3 is a subset of switches in another core ftree
 A+B are two interconnected leafs (or a QNEM if you like)
 C+D are also two interconnected leafs
 B is connected to y-2, and C to x-3
 A, B, C and D all have CAs connected

If you are allowed to use the horizontal link both on your way up
and on your way down you could get the following path scenario
all with valid shortest path:

  A to D : A --> x-2 --> x-1 --> x-3 --> C   --> D   (horizontal = DOWN)
  C to B : C --> D   --> y-3 --> y-1 --> y-2 --> B   (horizontal = UP)
  D to A : D --> y-3 --> y-1 --> y-2 --> B   --> A   (horizontal = DOWN)
  B to C : B --> A   --> x-2 --> x-1 --> x-3 --> C   (horizontal = UP)

Voila - you have potential deadlock for CN to CN communication.
The deadlock is removed if you only allow to use the horizontal
link in one direction, either up or down.


So there is a conflict here between avoiding deadlock and being able
to have full connectivity in a degraded "ftree topology". And maybe
also a conflict with system quality if reverse hops are used, I can't
tell. The latter can be fixed by introducing a configuration parameter
to control whether horizontal links should be accepted or not.


> I also discussed with Line about it but I'd like to have other opinions. Why should this be restricted to max rank switches? Voltaire for example is selling 648 port switches (2x 324 with horizontal links) which could use such an update in the Fat-Tree algorithm.
> 

Support for cross-links (as well as down-up type paths) will in general require
more than one VL for deadlock avoidance. Restrictions on where (and under what
conditions) cross-links can be introduced without causing deadlock potential will
allow selected cross-link scenarios to be supported with only one VL.
Such support could be introduced in several steps.


> Regards
> 
> Nicolas
> 

Line



More information about the general mailing list