[ofa-general] Performance penalty of OFED 1.1 versus IBGD 1.8.2

Sayantan Sur surs at cse.ohio-state.edu
Wed Feb 28 08:40:40 PST 2007


Hi Roland,

* On Feb,2 Pavel Shamis (Pasha)<pasha at dev.mellanox.co.il> wrote :
> Roland Fehrenbacher wrote:
> >>>>>>"Pavel" == Pavel Shamis <(Pasha)" <pasha at dev.mellanox.co.il>> writes:
> >
> >    Pavel> Hi Roland,
> >    >> >> I'm migrating from IBGD 1.8.2 (kernel 2.6.15.7) to OFED 1.1,
> >    >> >> and saw some unpleasant performance drops when using OFED
> >    >> 1.1 >> (kernel 2.6.20.1 with included IB drivers). The main
> >    >> drop is in >> throughput as measured by the OSU MPI bandwidth
> >    >> >> benchmark. However, the latency for large packet sizes is
> >    >> also >> worse (see results below). I tried with and without
> >    >> "options >> ib_mthca msi_x=1" (using IBGD, disabling msi_x
> >    >> makes a >> siginficant performance difference of
> >    >> approx. 10%). The IB card >> is a Mellanox MHGS18-XT (PCIe/DDR
> >    >> Firmware 1.2.0) running on an >> Opteron with nForce4 2200
> >    >> Professional chipset.
> >    >> >> 
> >    >> >> Does anybody have an explanation or even better a solution
> >    >> to >> this issue?
> >    >> 
> >
> >    Pavel> Please try to add follow mvapich parameter :
> >    Pavel> VIADEV_DEFAULT_MTU=MTU2048
> >    >> Thanks for the suggestion. Unfortunately, it didn't improve the
> >    >> simple bandwidth results. Bi-directional bandwidth increased by
> >    >> 3% though. Any more ideas?
> >
> >    Pavel> 3% is good start :-) Please also try to add this one:
> >    Pavel> VIADEV_MAX_RDMA_SIZE=4194304
> >
> >This brought another 2% in bi-directional bandwidth, but still nothing
> >in uni-directional bandwidth.
> >
> >mvapich version is 0.9.8
> 0.9.8 was not distributed (and tested) with OFED 1.1 :-(
> Please try to use package distributed with OFED 1.1 version.

MVAPICH-0.9.8 was tested by the MVAPICH team on OFED 1.1. It is being
used at several production clusters with OFED 1.1.

I ran the bandwidth test on our Opteron nodes, AMD Processor 254 (2.8
GHz), with Mellanox dual-port DDR cards. I can see a peak bandwidth of
1402 MillionBytes/sec as reported by OSU Bandwidth test. On the same
machines, I ran ib_rdma_bw (in the perftest module of OFED-1.1), which
reports lower Gen2 level performance numbers. The peak bw reported by
ib_rdma_bw is 1307.09 MegaBytes/sec (=1338.09*1.048 = 1402
MillionBytes/sec). So, the lower level numbers match up to what is
reported by MPI.

I'm wondering how your lower-level ib_rdma_bw numbers look like? Are
they matching up with what OSU BW test reports? If they are, then it is
likely some other issue than MPI.

We also have a MVAPICH-0.9.9 beta version out. You could give that a try
too, if you want. We will be making the full release soon.

Thanks,
Sayantan.

-- 
http://www.cse.ohio-state.edu/~surs



More information about the general mailing list