[ofa-general] Re: [PATCH v2] osm_ucast_ftree.c Allow horizontal links between switches of max rank

Nicolas Morey-Chaisemartin nicolas.morey-chaisemartin at ext.bull.net
Fri Jun 26 01:27:37 PDT 2009


Hi ,

Le 26/06/2009 09:24, Line.Holen at Sun.COM a écrit :
>> Moreover, I gave it some thoughts and I think there is a flaw in this approach.
>> For example
>>
>>
>>  L1-1    L1-2
>>   |       | \
>>   |       |  ----
>>   |       |      \
>>  _|___|___|_      L2
>>  | QNEM    |     |\\
>>  -----------     |\\\
>>   ////  \\\\     |\\\\
>> N[1-12] N[13-24] M[1-18]
>>
> 
> Nicolas, you are right. In this example you would not have
> full connectivity, N1-12 would not reach M1-18 because that
> would represent a down-up-down path.
> Your example represents a highly degraded system with no 
> common core switch for L2 and the leftmost switch in the QNEM.
> In this case, full connectivity could be reached with alternative
> routing algorithms but that should be an administrative decision.
> 

It's not necessary degraded. I took switches and many nodes for an example but you could as simply put a service node (admin?) directly under the L1-2 switch and have exactly the same problem.
Connecting a whole cluster like this would be stupid, but add I/O or service nodes makes sense.

> Handling the horizontal links as both up and down would give
> full connectivity in your example. But there are other examples
> that would introduce deadlock situations with your approach.
> 
> 
> Level 1    x-1          y-1          
>             | \        /  |
>             |  \      /   |
> Level 2    x-2 x-3  y-2  y-3
>             |    \  /     |
>             |     \/      |
>             |     /\      |
>             |    /  \     |
> leafs       A -- B   C -- D
> 
> Clarifications:
>  x1-3 is a subset of switches in a core ftree
>  y1-3 is a subset of switches in another core ftree
>  A+B are two interconnected leafs (or a QNEM if you like)
>  C+D are also two interconnected leafs
>  B is connected to y-2, and C to x-3
>  A, B, C and D all have CAs connected
> 
> If you are allowed to use the horizontal link both on your way up
> and on your way down you could get the following path scenario
> all with valid shortest path:
> 
>   A to D : A --> x-2 --> x-1 --> x-3 --> C   --> D   (horizontal = DOWN)
>   C to B : C --> D   --> y-3 --> y-1 --> y-2 --> B   (horizontal = UP)
>   D to A : D --> y-3 --> y-1 --> y-2 --> B   --> A   (horizontal = DOWN)
>   B to C : B --> A   --> x-2 --> x-1 --> x-3 --> C   (horizontal = UP)
> 
> Voila - you have potential deadlock for CN to CN communication.
> The deadlock is removed if you only allow to use the horizontal
> link in one direction, either up or down.
> 

It won't route this way if you optimize thing.
You have to consider that you can take both up and down but they are not part of both up port groups and down port groups.
By exploring in the right order (up before horizontal and down before horizontal) you can avoid that.
Bascially if possible you'll always use the horizontal link last (in the algorithm which means first on the route).  

So here we would have
A to D: A --> B --> y-2 --> y-1 --> y-3 --> D
B to C: B --> A --> x-2 --> x-1 --> x-3 --> C
C to B: C --> D --> y-3 --> y-1 --> y-2 --> B
D to A: D --> C --> x-3 --> x-1 --> x-2 --> A

Which is basically what you obtain with your patch


Nicolas



More information about the general mailing list