[ofa-general] Performance penalty of OFED 1.1 versus IBGD 1.8.2

Roland Fehrenbacher rf at q-leap.de
Wed Feb 28 09:14:37 PST 2007


>>>>> "Sayantan" == Sayantan Sur <surs at cse.ohio-state.edu> writes:

    Roland> I'm migrating from IBGD 1.8.2 (kernel 2.6.15.7) to OFED
    Roland> 1.1, and saw some unpleasant performance drops when using
    Roland> OFED 1.1 (kernel 2.6.20.1 with included IB drivers). The
    Roland> main drop is in throughput as measured by the OSU MPI
    Roland> bandwidth benchmark. However, the latency for large packet
    Roland> sizes is also worse (see results below). I tried with and
    Roland> without "options ib_mthca msi_x=1" (using IBGD, disabling
    Roland> msi_x makes a siginficant performance difference of
    Roland> approx. 10%). The IB card is a Mellanox MHGS18-XT
    Roland> (PCIe/DDR Firmware 1.2.0) running on an Opteron with
    Roland> nForce4 2200 Professional chipset.
    Roland> 
    Roland> Does anybody have an explanation or even better a solution
    Roland> to this issue?

    Pavel> Please try to add follow mvapich parameter :
    Pavel> VIADEV_DEFAULT_MTU=MTU2048

    Roland> Thanks for the suggestion. Unfortunately, it didn't
    Roland> improve the simple bandwidth results. Bi-directional
    Roland> bandwidth increased by 3% though. Any more ideas?

    Pavel> 3% is good start :-) Please also try to add this one:
    Pavel> VIADEV_MAX_RDMA_SIZE=4194304

    Roland> This brought another 2% in bi-directional bandwidth, but
    Roland> still nothing in uni-directional bandwidth.

    Roland> mvapich version is 0.9.8

    Pavel>  0.9.8 was not distributed (and tested) with OFED 1.1 :-(
    Pavel> Please try to use package distributed with OFED 1.1
    Pavel> version.

    Sayantan> MVAPICH-0.9.8 was tested by the MVAPICH team on OFED
    Sayantan> 1.1. It is being used at several production clusters
    Sayantan> with OFED 1.1.

    Sayantan> I ran the bandwidth test on our Opteron nodes, AMD
    Sayantan> Processor 254 (2.8 GHz), with Mellanox dual-port DDR
    Sayantan> cards. I can see a peak bandwidth of 1402
    Sayantan> MillionBytes/sec as reported by OSU Bandwidth test. On
    Sayantan> the same machines, I ran ib_rdma_bw (in the perftest
    Sayantan> module of OFED-1.1), which reports lower Gen2 level
    Sayantan> performance numbers. The peak bw reported by ib_rdma_bw
    Sayantan> is 1307.09 MegaBytes/sec (=1338.09*1.048 = 1402
    Sayantan> MillionBytes/sec). So, the lower level numbers match up
    Sayantan> to what is reported by MPI.

    Sayantan> I'm wondering how your lower-level ib_rdma_bw numbers
    Sayantan> look like?

I get:

3802: Bandwidth peak (#0 to #989): 1288.55 MB/sec
3802: Bandwidth average: 1288.54 MB/sec
3802: Service Demand peak (#0 to #989): 1818 cycles/KB
3802: Service Demand Avg  : 1818 cycles/KB

so, 1288.55MB/sec*1.048 = 1350 MillionBytes/sec, also matches up
exactly with the MPI results (see results below)

    Sayantan> Are they matching up with what OSU BW test reports? If
    Sayantan> they are, then it is likely some other issue than MPI.

Looks like it's not MPI then. What else could be wrong? Why is IBGD so
much better in my case?

    Sayantan> We also have a MVAPICH-0.9.9 beta version out. You could
    Sayantan> give that a try too, if you want. We will be making the
    Sayantan> full release soon.

Probably won't help in this case.

Roland

> ------------------------------------------------------------------------
> 
> IBGD
> --------
> 
> # OSU MPI Bandwidth Test (Version 2.1)
> # Size          Bandwidth (MB/s)
> 1               0.830306
> 2               1.642710
> 4               3.307494
> 8               6.546477
> 16              13.161954
> 32              26.395154
> 64              52.913060
> 128             101.890547
> 256             172.227478
> 512             383.296292
> 1024            611.172247
> 2048            830.147571
> 4096            1068.057366
> 8192            1221.262520
> 16384           1271.771983
> 32768           1369.702828
> 65536           1426.124683
> 131072          1453.781151
> 262144          1457.297992
> 524288          1464.625860
> 1048576         1468.953875
> 2097152         1470.614903
> 4194304         1471.607758
> 
> # OSU MPI Latency Test (Version 2.1)
> # Size          Latency (us)
> 0               3.03
> 1               3.03
> 2               3.04
> 4               3.03
> 8               3.03
> 16              3.04
> 32              3.11
> 64              3.23
> 128             3.49
> 256             3.83
> 512             4.88
> 1024            6.31
> 2048            8.60
> 4096            11.02
> 8192            15.78
> 16384           28.85
> 32768           39.82
> 65536           60.30
> 131072          106.65
> 262144          196.47
> 524288          374.62
> 1048576         730.79
> 2097152         1442.32
> 4194304         2864.80
> 
> OFED 1.1
> ---------
> 
> # OSU MPI Bandwidth Test (Version 2.2)
> # Size          Bandwidth (MB/s)
> 1               0.698614
> 2               1.463192
> 4               2.941852
> 8               5.859464
> 16              11.697510
> 32              23.339031
> 64              46.403081
> 128             92.013928
> 256             182.918388
> 512             315.076923
> 1024            500.083937
> 2048            765.294564
> 4096            1003.652513
> 8192            1147.640312
> 16384           1115.803139
> 32768           1221.120298
> 65536           1282.328447
> 131072          1315.715608
> 262144          1331.456393
> 524288          1340.691793
> 1048576         1345.650404
> 2097152         1349.279211
> 4194304         1350.489883
> 
> # OSU MPI Latency Test (Version 2.2)
> # Size          Latency (us)
> 0               2.99
> 1               3.03
> 2               3.06
> 4               3.03
> 8               3.03
> 16              3.04
> 32              3.12
> 64              3.27
> 128             3.96
> 256             4.29
> 512             4.99
> 1024            6.53
> 2048            9.08
> 4096            11.92
> 8192            17.39
> 16384           31.05
> 32768           43.47
> 65536           67.17
> 131072          115.30
> 262144          212.33
> 524288          405.20
> 1048576         790.45
> 2097152         1558.88
> 4194304         3095.17
> 
> 
> ------------------------------------------------------------------------




More information about the general mailing list