[ofa-general] Re: [OPENSM PATCH 0/5]: New "guid-routing-order" option for updn routing

Al Chu chu11 at llnl.gov
Mon Jun 16 08:52:32 PDT 2008


Hey Sasha,

On Mon, 2008-06-16 at 11:24 +0300, Sasha Khapyorsky wrote:
> Hi Al,
> 
> On 15:48 Fri 13 Jun     , Al Chu wrote:
> > 
> > This is a conceptually simple option I've developed for updn routing.
> > 
> > Currently in updn routing, nodes/guids are routed on switches in a
> > seemingly-random order, which I believe is due to internal data
> > structure organization (i.e. cl_qmap_apply_func is called on
> > port_guid_tbl) as well as how the fabric is scanned (it is logically
> > scanned from a port perspective, but it may not be logical from a node
> > perspective).  I had a hypothesis that this was leading to increased
> > contention in the network for MPI.
> > 
> > For example, suppose we have 12 uplinks from a leaf switch to a spine
> > switch.  If we want to send data from this leaf switch to node[13-24],
> > the up links we will send on are pretty random.
> 
> Yeah, the issue is known. And idea is good and useful.
> 
> Actually we discussed this issue with Yiftah some time ago and his idea
> was to have an option to order routing generation by ports connected to
> leaf switches with higher number of active links. 
>
> Something like:
> 
>   foreach switch reverse (higher is first) sorted by number of active links
>     foreach port connected to the switch
>       do_rounting()

What do you mean by "foreach switch reverse"?

> Which is good for most cases, but also I thought that in addition we will
> need something more configurable - just like your patch series :).

Cool.

> Now comment: why do you think it is useful only for Up/Down?

At first, I imagined this patch only useful for networks with tree like
configurations (thus people using updn).  But now that you mention it,
it is probably useful for many configurations and people that might use
min-hop.

> IMO min hops algo could benefit from this feature just well. If so this
> simplifies implementation - patches 2,3,4 are not needed, code from patch
> 5 is going to osm_ucast_mgr.c. What do you think?

Sounds like a plan.  I will adjust when I adjust for the preserve-base-
lids patch too. 

Al

> Sasha
-- 
Albert Chu
chu11 at llnl.gov
925-422-5311
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory




More information about the general mailing list