[ofa-general] [OpenSM] [PATCH 0/3] New "port-offsetting" option to updn/minhop routing
Al Chu
chu11 at llnl.gov
Wed May 28 17:14:59 PDT 2008
Hey Sasha,
Attached are some numbers from a recent run I did with my port
offsetting patches. I ran w/ mvapich 0.9.9 and OpenMPI 1.2.6 on 120
nodes. I ran w/ 1 task per node or 8 tasks per node (nodes have 8
processors each), trying LMC=0, LMC=1, and LMC=2 with the original
'updn', then LMC=1 and LMC=2 with my port-offsetting patch (labeled
"PO"). Next to these columns are the percentage worse the numbers are
in comparison to LMC=0. My understanding is that mvapich 0.9.9 does not
know how to take advantage of multiple lids while openMPI 1.2.6 does
know how to take advantage of it.
I think the key numbers to notice are that without port-offsetting,
performance relative to LMC=0 is pretty bad when the MPI implementation
does not know how to take advantage of multiple lids (mvapich 0.9.9).
LMC=1 shows ~30% performance degradation and LMC=2 shows ~90%
degradation on this cluster. With the port-offsetting turned on, the
degradation falls to 0%-6%, a few times even being faster. We consider
this within "noise" levels.
For MPIs that do know how to take advantage of multiple lids it seems
that the port-offsetting patch doesn't affect performance that much.
(See OpenMPI 1.2.6 sections).
PLMK what you think. Thanks.
Al
On Thu, 2008-04-10 at 14:10 -0700, Al Chu wrote:
> Hey Sasha,
>
> I was going to submit this after I had a chance to test on one of our
> big clusters to see if it worked 100% right. But my final testing has
> been delayed (for a month now!). Ira said some folks from Sonoma were
> interested in this, so I'll go ahead and post it.
>
> This is a patch for something I call "port_offsetting" (name/description
> of the option is open to suggestion). Basically, we want to move to
> using lmc > 0 on our clusters b/c some of the newer MPI implementations
> take advantage of multiple lids and have shown faster performance when
> lmc > 0.
>
> The problem is that those users that do not use the newer MPI
> implementations, or do not run their code in a way that can take
> advantage of multiple lids, suffer great performance degradation in
> their code. We determined that the primary issue is what we started
> calling "base lid alignment". Here's a simple example.
>
> Assume LMC = 2 and we are trying to route the lids of 4 ports (A,B,C,D).
> Those lids are:
>
> port A - 1,2,3,4
> port B - 5,6,7,8
> port C - 9,10,11,12
> port D - 13,14,15,16
>
> Suppose forwarding of these lids goes through 4 switch ports. If we
> cycle through the ports like updn/minhop currently do, we would see
> something like this.
>
> switch port 1: 1, 5, 9, 13
> switch port 2: 2, 6, 10, 14
> switch port 3: 3, 7, 11, 15
> switch port 4: 4, 8, 12, 16
>
> Note that the base lid of each port (lids 1, 5, 9, 13) goes through only
> 1 port of the switch. Thus a user that uses only the base lid is using
> only 1 port out of the 4 ports they could be using. Leading to terrible
> performance.
>
> We want to get this instead.
>
> switch port 1: 1, 8, 11, 14
> switch port 2: 2, 5, 12, 15
> switch port 3: 3, 6, 9, 16
> switch port 4: 4, 7, 10, 13
>
> where base lids are distributed in a more even manner.
>
> In order to do this, we (effectively) iterate through all ports like
> before, but we iterate starting at a different index depending on the
> number of paths we have routed thus far.
>
> On one of our clusters, some testing has shown when we run w/ LMC=1 and
> 1 task per node, mpibench (AlltoAll tests) range from 10-30% worse than
> when LMC=0 is used. With LMC=2, mpibench tends to be 50-70% worse in
> performance than with LMC=0.
>
> With the port offsetting option, the performance degradation ranges 1-5%
> worse than LMC=0. I am currently at a loss why I cannot get it to be
> even to LMC=0, but 1-5% is small enough to not make users mad :-)
>
> The part I haven't been able to test yet is whether newer MPIs that do
> take advantage of LMC > 0 run equally when my port_offsetting is turned
> off and on. That's the part I'm still haven't been able to test.
>
> Thanks, look forward to your comments,
>
> Al
>
>
--
Albert Chu
chu11 at llnl.gov
925-422-5311
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mpi_port_offsetting.xls
Type: application/vnd.ms-excel
Size: 17408 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080528/82c4a362/attachment.xls>
More information about the general
mailing list