[ofa-general] Infiniband bandwidth
Ramiro Alba Queipo
raq at cttc.upc.edu
Wed Oct 1 09:08:46 PDT 2008
Hi all,
We have an infiniband cluster of 22 nodes witch 20 Gbps Mellanox
MHGS18-XTC cards and I tried to make performance net tests both to check
hardware as to clarify concepts.
Starting from the theoretic pick according to the infiniband card (in my
case 4X DDR => 20 Gbits/s => 2.5 Gbytes/s) we have some limits:
1) Bus type: PCIe 8x => 250 Mbytes/lane => 250 * 8 = 2 Gbytes/s
2) According to a thread an users openmpi mail-list (???):
The 16 Gbit/s number is the theoretical peak, IB is coded 8/10 so
out of the 20 Gbit/s 16 is what you get. On SDR this number is
(of course) 8 Gbit/s achievable (which is ~1000 MB/s) and I've
seen well above 900 on MPI (this on 8x PCIe, 2x margin)
Is this true?
3) According to other comment in the same thread:
The data throughput limit for 8x PCIe is ~12 Gb/s. The theoretical
limit is 16 Gb/s, but each PCIe packet has a whopping 20 byte
overhead. If the adapter uses 64 byte packets, then you see 1/3 of
the throughput go to overhead.
Could someone explain me that?
Then I got another comment about the matter:
The best uni-directional performance I have heard of for PCIe 8x IB
DDR is ~1,400 MB/s (11.2 Gb/s) with Lustre, which is about 55% of the
theoretical 20 Gb/s advertised speed.
---------------------------------------------------------------------
Now, I did some tests (mpi used is OpenMPI) with the following results:
a) Using "Performance tests" from OFED 1.31
ib_write_bw -a server -> 1347 MB/s
b) Using hpcc (2 cores at diferent nodes) -> 1157 MB/s (--mca
mpi_leave_pinned 1)
c) Using "OSU Micro-Benchmarks" in "MPItests" from OFED 1.3.1
1) 2 cores from different nodes
- mpirun -np 2 --hostfile pool osu_bibw -> 2001.29 MB/s
(bidirectional)
- mpirun -np 2 --hostfile pool osu_bw -> 1311.31 MB/s
2) 2 cores from the same node
- mpirun -np 2 osu_bibw -> 2232 MB/s (bidirectional)
- mpirun -np 2 osu_bw -> 2058 MB/s
The questions are:
- Are those results coherent with what it should be?
- Why tests with the two core in the same node are better?
- Should not the bidirectional test be a bit higher?
- Why hpcc is so low?
Thanks in advance
Regards
--
Aquest missatge ha estat analitzat per MailScanner
a la cerca de virus i d'altres continguts perillosos,
i es considera que està net.
For all your IT requirements visit: http://www.transtec.co.uk
More information about the general
mailing list