[ofa-general] opensm routing
Al Chu
chu11 at llnl.gov
Mon Jun 16 09:16:43 PDT 2008
On Mon, 2008-06-16 at 17:32 +0300, Yevgeny Kliteynik wrote:
> Jeff,
>
> Jeff Becker wrote:
> > Hi Al
> >
> > Al Chu wrote:
> >> Hey Jeff,
> >>
> >>
> >>> That works. The compute nodes need to talk to other compute nodes for
> >>> MPI over one set of links, and they need to talk to the Lustre nodes
> >>> for I/O, but over a different (disjoint) set of links. Thanks.
> >>>
> >>
> >> Is there a strong belief that a different/disjoint set of links would be
> >> beneficial? Sometime ago, Sasha and I iterated on a patch in which I
> >> found out sometimes not all switch ports would be used. In this
> >> particular case, a chunk of leaf switches were sometimes using only 11
> >> out of 12 uplinks. After the fix, mpigraph showed about 20% improvement
> >> in MPI bandwidth.
> >>
> > Basically, we want to avoid situations where I/O and MPI contend for the
> > same links, and get in each other's way.
>
> What about using different VLs for MPI and I/O?
Adam Moody ran this idea by me sometime ago too and was something I
thought of looking into later. (We are analyzing/dealing w/ routing
first :-).
I have no idea if different service levels can be configured into MPI
implementations. I asked the Lustre people in my hallway, and it isn't
currently configurable for Lustre. This isn't to say it's not doable,
but would take some effort.
Al
> It won't buy more bandwidth, but it might prevent MPI and I/O from
> congesting each other - they will share the wire according to the
> priority that you will define.
>
> -- Yevgeny
>
> > -jeff
--
Albert Chu
chu11 at llnl.gov
925-422-5311
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory
More information about the general
mailing list