[ofa-general] opensm routing

Al Chu chu11 at llnl.gov
Mon Jun 16 09:16:43 PDT 2008


On Mon, 2008-06-16 at 17:32 +0300, Yevgeny Kliteynik wrote:
> Jeff,
> 
> Jeff Becker wrote:
> > Hi Al
> > 
> > Al Chu wrote:
> >> Hey Jeff,
> >>
> >>  
> >>> That works. The compute nodes need to talk to other compute nodes for 
> >>> MPI over one set of links, and they need to talk to the Lustre nodes 
> >>> for I/O, but over a different (disjoint) set of links. Thanks.
> >>>     
> >>
> >> Is there a strong belief that a different/disjoint set of links would be
> >> beneficial?  Sometime ago, Sasha and I iterated on a patch in which I
> >> found out sometimes not all switch ports would be used.  In this
> >> particular case, a chunk of leaf switches were sometimes using only 11
> >> out of 12 uplinks.  After the fix, mpigraph showed about 20% improvement
> >> in MPI bandwidth.
> >>   
> > Basically, we want to avoid situations where I/O and MPI contend for the 
> > same links, and get in each other's way.
> 
> What about using different VLs for MPI and I/O?

Adam Moody ran this idea by me sometime ago too and was something I
thought of looking into later.  (We are analyzing/dealing w/ routing
first :-).

I have no idea if different service levels can be configured into MPI
implementations.  I asked the Lustre people in my hallway, and it isn't
currently configurable for Lustre.  This isn't to say it's not doable,
but would take some effort.

Al

> It won't buy more bandwidth, but it might prevent MPI and I/O from
> congesting each other - they will share the wire according to the
> priority that you will define.
> 
> -- Yevgeny
> 
> > -jeff
-- 
Albert Chu
chu11 at llnl.gov
925-422-5311
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory




More information about the general mailing list