[ofa-general] Infiniband performance
Jan Ruffing
ruffing at motama.com
Thu Dec 11 01:55:38 PST 2008
Hello,
I'm new to Infiniband and still trying to get a grasp on what performance it can realistically deliver.
The two directly connected test machines have Mellanox Infinihost III Lx DDR HCA cards installed and run OpenSuse 11 with a 2.6.25.16 Kernel.
1) Maximum Bandwidth?
Infiniband (Double Data Rate, 4x lane) is advertised with a bandwidth of 20 Gbit/s. If my understanding is correct, this is only the signal rate, which would translate to a 16/Gbit/s data rate due to 8:10 encryption? The maximum speed I meassured so far was 12Gbit/s on the low-level-Protocolls:
tamara /home/ruffing> ibv_rc_pingpong -m 2048 -s 1048576 -n 10000
local address: LID 0x0001, QPN 0x3b0405, PSN 0x13a302
remote address: LID 0x0002, QPN 0x380405, PSN 0x9a46ba
20971520000 bytes in 13.63 seconds = 12313.27 Mbit/sec
10000 iters in 13.63 seconds = 1362.53 usec/iter
melissa Dokumente/Infiniband> ibv_rc_pingpong 192.168.2.1 -m 2048 -s 1048576 -n 10000
local address: LID 0x0002, QPN 0x380405, PSN 0x9a46ba
remote address: LID 0x0001, QPN 0x3b0405, PSN 0x13a302
20971520000 bytes in 13.63 seconds = 12313.38 Mbit/sec
10000 iters in 13.63 seconds = 1362.52 usec/iter
Maximal user-level bandwidth was 11.5 GBit/s using RDMA:
ruffing at melissa:~/Dokumente/Infiniband/NetPIPE-3.7.1> ./NPibv -m 2048 -t rdma_write -c local_poll -h 192.168.2.1 -n 100
Using RDMA Write communications
Using local polling completion
Preposting asynchronous receives (required for Infiniband)
Now starting the main loop
[...]
121: 8388605 bytes 100 times --> 11851.72 Mbps in 5400.06 usec
122: 8388608 bytes 100 times --> 11851.66 Mbps in 5400.09 usec
123: 8388611 bytes 100 times --> 11850.62 Mbps in 5400.57 usec
That's actually 4 Gbit/s short of what I was hoping for. Yet I couldn't find any test results on the net that yielded more than 12 GBit/s on 4x DDR-HCAs. Where does this performance loss stem from? On first view, 4 GBit/s (25% of the data rate) looks quite a lot to be only protocol overhead...
Is 12 GBit/s the current maximum bandwidth, or is it possible for Infiniband users to improve performance beyond that?
2) TCP (over IPoIB) vs. RDMA/SDP/uverbs?
On the first Infiniband installation using the packages of the OpenSuse 11 distribution, I got a TCP bandwidth of 10 GBit/s. (Which actually isn't that bad when compared to a meassured maximal bandwidth of 12 GBit/s.) This installation did neither support RDMA nor SDP, though.
tamara iperf-2.0.4/src> ./iperf -c 192.168.2.2 -l 3M
------------------------------------------------------------
Client connecting to 192.168.2.2, TCP port 5001
TCP window size: 515 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.2.1 port 47730 connected with 192.168.2.2 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 11.6 GBytes 10.0 Gbits/sec
After I installed the OFED 1.4 beta to be able to use SDP, RDMA and uverbs, I could use them to get of 12 GBit/s. Yet the TCP rate dropped by 2-3 GBit/s to 7-8 GBit/s.
ruffing at tamara:~/Dokumente/Infiniband/iperf-2.0.4/src> ./iperf -c 192.168.2.2 -l 10M
------------------------------------------------------------
Client connecting to 192.168.2.2, TCP port 5001
TCP window size: 193 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.2.1 port 51988 connected with 192.168.2.2 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 8.16 GBytes 7.00 Gbits/sec
What could have caused this loss of bandwidth? Is there a way to avoid it? Obviously, this could be a show stopper (for me) as far as native Infiniband protocolls are concerned: Gaining 2 GBit/sec under special circumstances probably won't outweigh loosing 3 GBit/s during normal use.
3) SDP performance
The SDP performance (using preloading of libsdp.so) only meassured 6.2 GBit/s, even underperforming TCP:
ruffing at tamara:~/Dokumente/Infiniband/iperf-2.0.4/src> LD_PRELOAD=/usr/lib/libsdp.so LIBSDP_CONFIG_FILE=/etc/libsdp.conf ./iperf -c 192.168.2.2 -l 10M
------------------------------------------------------------
Client connecting to 192.168.2.2, TCP port 5001
TCP window size: 16.0 MByte (default)
------------------------------------------------------------
[ 4] local 192.168.2.1 port 36832 connected with 192.168.2.2 port 5001
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.0 sec 7.22 GBytes 6.20 Gbits/sec
/etc/libsdp.conf consits of the following two lines:
use both server * *:*
use both client * *:*
I have a hard time believing that's the max rate of SDP. (Even if Cisco meassured similar 6.6 GBit/s: https://www.cisco.com/en/US/docs/server_nw_virtual/commercial_host_driver/host_driver_linux/user/guide/sdp.html#wp948100)
Did I mess up my Infiniband installation, or is SDP really slower than TCP over IPoIB?
Sorry if my mail might sound somewhat negative, but I'm still trying to get past the marketing buzz and figure out what to realisticly expect of Infiniband. Currently, I'm still hoping that I messed up my installation somewhere, and that a few pointers in the right direction might resolve most of the issues... :)
Thanks in advance,
Jan Ruffing
Devices:
tamara /dev/infiniband> ls -la
total 0
drwxr-xr-x 2 root root 140 2008-12-02 16:20 .
drwxr-xr-x 13 root root 4580 2008-12-09 14:59 ..
crw-rw---- 1 root root 231, 64 2008-12-02 16:20 issm0
crw-rw-rw- 1 root users 10, 59 2008-11-27 10:24 rdma_cm
crw-rw---- 1 root root 231, 0 2008-12-02 16:20 umad0
crw-rw-rw- 1 root users 231, 192 2008-11-27 10:15 uverbs0
crw-rw---- 1 root users 231, 193 2008-11-27 10:15 uverbs1
Installed Packages:
Build ofa_kernel RPM
Install kernel-ib RPM:
Build ofed-scripts RPM
Install ofed-scripts RPM:
Install libibverbs RPM:
Install libibverbs-devel RPM:
Install libibverbs-devel-static RPM:
Install libibverbs-utils RPM:
Install libmthca RPM:
Install libmthca-devel-static RPM:
Install libmlx4 RPM:
Install libmlx4-devel RPM:
Install libcxgb3 RPM:
Install libcxgb3-devel RPM:
Install libnes RPM:
Install libnes-devel-static RPM:
Install libibcm RPM:
Install libibcm-devel RPM:
Install libibcommon RPM:
Install libibcommon-devel RPM:
Install libibcommon-static RPM:
Install libibumad RPM:
Install libibumad-devel RPM:
Install libibumad-static RPM:
Build libibmad RPM
Install libibmad RPM:
Install libibmad-devel RPM:
Install libibmad-static RPM:
Install ibsim RPM:
Install librdmacm RPM:
Install librdmacm-utils RPM:
Install librdmacm-devel RPM:
Install libsdp RPM:
Install libsdp-devel RPM:
Install opensm-libs RPM:
Install opensm RPM:
Install opensm-devel RPM:
Install opensm-static RPM:
Install compat-dapl RPM:
Install compat-dapl-devel RPM:
Install dapl RPM:
Install dapl-devel RPM:
Install dapl-devel-static RPM:
Install dapl-utils RPM:
Install perftest RPM:
Install mstflint RPM:
Install sdpnetstat RPM:
Install srptools RPM:
Install rds-tools RPM:
(installed ibutils manually)
Loaded Modules:
(libsdp currently unloaded)
Directory: /home/ruffing
tamara /home/ruffing> lsmod | grep ib
ib_addr 24580 1 rdma_cm
ib_ipoib 97576 0
ib_cm 53584 2 rdma_cm,ib_ipoib
ib_sa 55944 3 rdma_cm,ib_ipoib,ib_cm
ib_uverbs 56884 1 rdma_ucm
ib_umad 32016 4
mlx4_ib 79884 0
mlx4_core 114924 1 mlx4_ib
ib_mthca 148924 0
ib_mad 53400 5 ib_cm,ib_sa,ib_umad,mlx4_ib,ib_mthca
ib_core 81152 12 rdma_ucm,rdma_cm,iw_cm,ib_ipoib,ib_cm,ib_sa,ib_uverbs,ib_umad,iw_cxgb3,mlx4_ib,ib_mthca,ib_mad
ipv6 281064 23 ib_ipoib
rtc_lib 19328 1 rtc_core
libata 176604 2 ata_piix,pata_it8213
scsi_mod 168436 4 sr_mod,sg,sd_mod,libata
dock 27536 1 libata
tamara /home/ruffing> lsmod | grep rdma
rdma_ucm 30248 0
rdma_cm 49544 1 rdma_ucm
iw_cm 25988 1 rdma_cm
ib_addr 24580 1 rdma_cm
ib_cm 53584 2 rdma_cm,ib_ipoib
ib_sa 55944 3 rdma_cm,ib_ipoib,ib_cm
ib_uverbs 56884 1 rdma_ucm
ib_core 81152 12 rdma_ucm,rdma_cm,iw_cm,ib_ipoib,ib_cm,ib_sa,ib_uverbs,ib_umad,iw_cxgb3,mlx4_ib,ib_mthca,ib_mad
--
Jan Ruffing
Software Developer
Motama GmbH
Lortzingstraße 10 · 66111 Saarbrücken · Germany
tel +49 681 940 85 50 · fax +49 681 940 85 49
ruffing at motama.com · www.motama.com
Companies register · district council Saarbrücken · HRB 15249
CEOs · Dr.-Ing. Marco Lohse, Michael Repplinger
This e-mail may contain confidential and/or privileged information.
If you are not the intended recipient (or have received this e-mail
in error) please notify the sender immediately and destroy this
e-mail. Any unauthorized copying, disclosure or distribution of the
material in this e-mail is strictly forbidden.
More information about the general
mailing list