[ofa-general] Re: [PATCH v3] opensm/osm_ucast_ftree.c: Fixed bug on index port incrementation
Sasha Khapyorsky
sashak at voltaire.com
Mon Feb 9 11:44:19 PST 2009
On 16:55 Mon 09 Feb , Nicolas Morey Chaisemartin wrote:
> This patch fixes a bug in index port incrementation in the fat-tree
> algorithm.
> Problem happens (at least) with a 4 level Fat tree as below:
>
>
> L3 L3
> ___________________|__|____________________
> / / \ \ <= All
> the L2 are connected on 2 L3 switches
> L2-1 L2-2 L2-1 L2-2
> / / \ \ <== The
> Nth L1 of a set leads only to the Nth L2 (L2-N). With some pruning.
> L1 L1 L1 L1
> /|\ /|\ /|\ /|\
> ==Fully mixed to L1== ==Fully mixed to L1== <=== We have
> multiple set. In each set, all L0 lead to all L1 of their set.
>
> L0 L0 L0 L0
> / \ / \ / \ / \
> CN CN .. CN CN .... CN CN .. CN CN
>
>
> To detail:
> We have a bunch of sets. Each set contains compute node, L0 and L1
> switches.
> Plus a common top of L2 and L3 switches.
>
> In each set, there are groups of compute nodes. Each group is connected to
> a single L0 switch.
> In a given set, all L0 are connected to all L1.
>
> The Nth L1 of a set is connected to the Nth L2 and only to this one. (so
> through a L2, the Nth L1 can only see the Nth L1 of the other sets)
> All the L2 are connected to a couple of L3.
>
>
> If we dont put the L3. We have a perfectly balanced fat tree and well
> equilibrated routes.
> But when we add the L3, it introduce a huge difference. As it is not
> necessary, no route is going through L3 (which is fine).
> However 1/4 of L2->L1 routes is not used at all, 1/2 is half used and 1/4
> is twice overused (compared to the balanced state).
>
> This comes from the down_port_groups_idx which is incremented each time the
> algorithm goes down through a node whether it creates routes to HCA (port
> != switch)
> or not. As route coming up from a L1 reaches only one L2, the algorithm
> goes through all the other L2 while going down, incrementing their index.
> Our case here is a bit specific but in a case where your L1 doesn't have
> full connectivity to all your L2, and another switch rank above, the
> problem may appear.
>
> To avoid this problem, __osm_ftree_fabric_route_upgoing_by_going_down
> function has been changed so it returns a value to indicate if routes to
> HCA (in fact to leaf switch) were created.
> With this information, we only increase the index when the algorithm has
> created routes to HCA.
> After applying this patch and measuring the link usage, we are perfectly
> balanced (L2<->L3 links are still not used but that is to be expected).
>
> Signed-off-by: Nicolas Morey-Chaisemartin
> <nicolas.morey-chaisemartin at ext.bull.net>
Applied. Thanks.
Sasha
More information about the general
mailing list