[ofa-general] Performance penalty of OFED 1.1 versus IBGD 1.8.2
Roland Fehrenbacher
rf at q-leap.de
Wed Feb 28 09:14:37 PST 2007
>>>>> "Sayantan" == Sayantan Sur <surs at cse.ohio-state.edu> writes:
Roland> I'm migrating from IBGD 1.8.2 (kernel 2.6.15.7) to OFED
Roland> 1.1, and saw some unpleasant performance drops when using
Roland> OFED 1.1 (kernel 2.6.20.1 with included IB drivers). The
Roland> main drop is in throughput as measured by the OSU MPI
Roland> bandwidth benchmark. However, the latency for large packet
Roland> sizes is also worse (see results below). I tried with and
Roland> without "options ib_mthca msi_x=1" (using IBGD, disabling
Roland> msi_x makes a siginficant performance difference of
Roland> approx. 10%). The IB card is a Mellanox MHGS18-XT
Roland> (PCIe/DDR Firmware 1.2.0) running on an Opteron with
Roland> nForce4 2200 Professional chipset.
Roland>
Roland> Does anybody have an explanation or even better a solution
Roland> to this issue?
Pavel> Please try to add follow mvapich parameter :
Pavel> VIADEV_DEFAULT_MTU=MTU2048
Roland> Thanks for the suggestion. Unfortunately, it didn't
Roland> improve the simple bandwidth results. Bi-directional
Roland> bandwidth increased by 3% though. Any more ideas?
Pavel> 3% is good start :-) Please also try to add this one:
Pavel> VIADEV_MAX_RDMA_SIZE=4194304
Roland> This brought another 2% in bi-directional bandwidth, but
Roland> still nothing in uni-directional bandwidth.
Roland> mvapich version is 0.9.8
Pavel> 0.9.8 was not distributed (and tested) with OFED 1.1 :-(
Pavel> Please try to use package distributed with OFED 1.1
Pavel> version.
Sayantan> MVAPICH-0.9.8 was tested by the MVAPICH team on OFED
Sayantan> 1.1. It is being used at several production clusters
Sayantan> with OFED 1.1.
Sayantan> I ran the bandwidth test on our Opteron nodes, AMD
Sayantan> Processor 254 (2.8 GHz), with Mellanox dual-port DDR
Sayantan> cards. I can see a peak bandwidth of 1402
Sayantan> MillionBytes/sec as reported by OSU Bandwidth test. On
Sayantan> the same machines, I ran ib_rdma_bw (in the perftest
Sayantan> module of OFED-1.1), which reports lower Gen2 level
Sayantan> performance numbers. The peak bw reported by ib_rdma_bw
Sayantan> is 1307.09 MegaBytes/sec (=1338.09*1.048 = 1402
Sayantan> MillionBytes/sec). So, the lower level numbers match up
Sayantan> to what is reported by MPI.
Sayantan> I'm wondering how your lower-level ib_rdma_bw numbers
Sayantan> look like?
I get:
3802: Bandwidth peak (#0 to #989): 1288.55 MB/sec
3802: Bandwidth average: 1288.54 MB/sec
3802: Service Demand peak (#0 to #989): 1818 cycles/KB
3802: Service Demand Avg : 1818 cycles/KB
so, 1288.55MB/sec*1.048 = 1350 MillionBytes/sec, also matches up
exactly with the MPI results (see results below)
Sayantan> Are they matching up with what OSU BW test reports? If
Sayantan> they are, then it is likely some other issue than MPI.
Looks like it's not MPI then. What else could be wrong? Why is IBGD so
much better in my case?
Sayantan> We also have a MVAPICH-0.9.9 beta version out. You could
Sayantan> give that a try too, if you want. We will be making the
Sayantan> full release soon.
Probably won't help in this case.
Roland
> ------------------------------------------------------------------------
>
> IBGD
> --------
>
> # OSU MPI Bandwidth Test (Version 2.1)
> # Size Bandwidth (MB/s)
> 1 0.830306
> 2 1.642710
> 4 3.307494
> 8 6.546477
> 16 13.161954
> 32 26.395154
> 64 52.913060
> 128 101.890547
> 256 172.227478
> 512 383.296292
> 1024 611.172247
> 2048 830.147571
> 4096 1068.057366
> 8192 1221.262520
> 16384 1271.771983
> 32768 1369.702828
> 65536 1426.124683
> 131072 1453.781151
> 262144 1457.297992
> 524288 1464.625860
> 1048576 1468.953875
> 2097152 1470.614903
> 4194304 1471.607758
>
> # OSU MPI Latency Test (Version 2.1)
> # Size Latency (us)
> 0 3.03
> 1 3.03
> 2 3.04
> 4 3.03
> 8 3.03
> 16 3.04
> 32 3.11
> 64 3.23
> 128 3.49
> 256 3.83
> 512 4.88
> 1024 6.31
> 2048 8.60
> 4096 11.02
> 8192 15.78
> 16384 28.85
> 32768 39.82
> 65536 60.30
> 131072 106.65
> 262144 196.47
> 524288 374.62
> 1048576 730.79
> 2097152 1442.32
> 4194304 2864.80
>
> OFED 1.1
> ---------
>
> # OSU MPI Bandwidth Test (Version 2.2)
> # Size Bandwidth (MB/s)
> 1 0.698614
> 2 1.463192
> 4 2.941852
> 8 5.859464
> 16 11.697510
> 32 23.339031
> 64 46.403081
> 128 92.013928
> 256 182.918388
> 512 315.076923
> 1024 500.083937
> 2048 765.294564
> 4096 1003.652513
> 8192 1147.640312
> 16384 1115.803139
> 32768 1221.120298
> 65536 1282.328447
> 131072 1315.715608
> 262144 1331.456393
> 524288 1340.691793
> 1048576 1345.650404
> 2097152 1349.279211
> 4194304 1350.489883
>
> # OSU MPI Latency Test (Version 2.2)
> # Size Latency (us)
> 0 2.99
> 1 3.03
> 2 3.06
> 4 3.03
> 8 3.03
> 16 3.04
> 32 3.12
> 64 3.27
> 128 3.96
> 256 4.29
> 512 4.99
> 1024 6.53
> 2048 9.08
> 4096 11.92
> 8192 17.39
> 16384 31.05
> 32768 43.47
> 65536 67.17
> 131072 115.30
> 262144 212.33
> 524288 405.20
> 1048576 790.45
> 2097152 1558.88
> 4194304 3095.17
>
>
> ------------------------------------------------------------------------
More information about the general
mailing list