[ofa-general] [PATCH] opensm/osm_ucast_ftree.c : Fixed bug on index port order incrementation
Nicolas Morey Chaisemartin
nicolas.morey-chaisemartin at ext.bull.net
Thu Jan 29 08:40:56 PST 2009
Hello,
While doing some routing analysis on fat tree using ibsim we found a "bug" in the fat-tree algorithm.
Problem happens with a 4 level Fat tree as below:
L3 L3
___________________|__|____________________
/ / \ \ <= All the L2 are connected on 2 L3 switches
L2-1 L2-2 L2-1 L2-2
/ / \ \ <== The Nth L1 of a set leads only to the Nth L2 (L2-N). With some pruning.
L1 L1 L1 L1
/|\ /|\ /|\ /|\
==Fully mixed to L1== ==Fully mixed to L1== <=== We have multiple set. In each set, all L0 lead to all L1 of their set.
L0 L0 L0 L0
/ \ / \ / \ / \
CN CN .. CN CN .... CN CN .. CN CN
To detail:
We have a bunch of sets. Each set contains compute node, L0 and L1 switches.
Plus a common top of L2 and L3 switches.
In each set, there are groups of compute nodes. Each group is connected to a single L0 switch.
In a given set, all L0 are connected to all L1.
The Nth L1 of a set is connected to the Nth L2 and only to this one. (so through a L2, the Nth L1 can only see the Nth L1 of the other sets)
All the L2 are connected to a couple of L3.
If we dont put the L3. We have a perfectly equilibrated fat tree and well equilibrated routes.
But when we add the L3, it introduce a huge difference. As it is not necessary, no route is going through L3 (which is fine).
However 1/4 of L2->L1 routes is not used at all, 1/2 is half used and 1/4 is twice overused (compared to the equilibrate state).
This comes from the down_port_groups_idx which is incremented each time the algorithm goes down through a node whether it creates routes to HCA (port != switch)
or not. As route coming up from a L1 reaches only one L2, the algorithm goes through all the other L2 while going down, incrementing their index.
Our case here is a bit specific but in a case where your L1 doesn't have full connectivity to all your L2, and another switch rank above, the problem may appear.
To avoid this problem, I've changed the __osm_ftree_fabric_route_upgoing_by_going_down function so it returns a value to indicate if routes to HCA (in fact to leaf switch) were created.
With this information, we only increase the index when the algorithm has created routes to HCA.
After applying this patch and measuring the link usage, we are at perfect equilibrium (L2<->L3 links are still not used but that is to be expected).
Signed-off-by: Nicolas Morey-Chaisemartin <nicolas.morey-chaisemartin at ext.bull.net>
---
opensm/opensm/osm_ucast_ftree.c | 23 ++++++++++++++---------
1 files changed, 14 insertions(+), 9 deletions(-)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 503dc4d590f8545d1846b2daa4d39c80fc5ac018.diff
Type: text/x-patch
Size: 2849 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20090129/415e8d57/attachment.bin>
More information about the general
mailing list