[ewg] Infiniband and NFS speed tuning - any ideas?

Ross Smith myxiplx at googlemail.com
Tue Dec 15 02:11:58 PST 2009


Hi folks,

Can anybody give me advice on how to tune NFS to improve performance
on this system?  I've got a pair of 40Gb/s QDR ConnectX cards attached
to a 20Gb/s DDR switch.  Infiniband diagnostics show that I can
consistently achieve a bandwidth of 1866MB/s, but the best I've gotten
out of NFS in testing is 440MB/s and in actual use I'm hitting nearer
290MB/s.

To test performance I'm creating a ramdisk, mounting it over NFS and
doing a simple write of a 100MB file:

The full setup is:

NFS Server:  192.168.2.5
NFS Client:  192.168.2.2

On server:
# mkdir ramdisk
# mount -t ramfs -o size=512m ramfs ./ramdisk
# chmod 777 ramdisk
# /usr/sbin/exportfs -o rw,insecure,async,fsid=0 :/home/ross/ramdisk
# /sbin/service nfs start
# ./fw.stop

On client:
# opensm -B
# mkdir remote
# ./fw.stop
# mount 192.168.2.5:/home/ross/ramdisk ./remote

The script I'm using to temporarily disable the firewall is:
# cat fw.stop
echo "stopping firewall"
iptables -F
iptables -X
iptables -t nat -F
iptables -t nat -X
iptables -t mangle -F
iptables -t mangle -X
iptables -P INPUT ACCEPT
iptables -P FORWARD ACCEPT
iptables -P OUTPUT ACCEPT


The bandwidth test:
# ib_send_bw 192.168.2.5
------------------------------------------------------------------
                   Send BW Test
Connection type : RC
Inline data is used up to 1 bytes message
 local address:  LID 0x01, QPN 0x10004b, PSN 0x5812a
 remote address: LID 0x04, QPN 0x80049, PSN 0x7e71c2
Mtu : 2048
------------------------------------------------------------------
 #bytes #iterations    BW peak[MB/sec]    BW average[MB/sec]
 65536        1000            1866.57               1866.22
------------------------------------------------------------------


Ramdisk speed test results (before exporting the folder):
dd if=/dev/zero of=./100mb bs=1024k count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 0.049329 seconds, 2.1 GB

And ramdisk results after exporting the folder (I'm not sure why this
should be so much slower, but this appears consistently reproducible)
# dd if=/dev/zero of=./100mb bs=1024k count=200
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 0.235899 seconds, 889 MB/s


I've checked that the client can cope with the speeds too, creating a
ramdisk there for testing:
# dd if=/dev/zero of=./100mb bs=1024k count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 0.060491 seconds, 1.7 GB/s


So I have an interconnect that can push 1.8GB/s, a server that can do
2.1GB/s, and a client that can cope with 1.7GB/s.  I'm aiming for
900MB/s+ over NFS, and in theory I have the infrastructure to cope
with that.

However, NFS speed test results are about a third of the level I'm
after, no matter how I try to tweak the settings:

dd if=/dev/zero of=./100mb bs=1024k count=100100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 0.313448 seconds, 335 MB/s

Sync NFS results are truly horrible (even though this is to a ramdisk):
# dd if=/dev/zero of=./100mb bs=1024k count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 4.84575 seconds, 21.6 MB/s

# dd if=/dev/zero of=./100mb2 bs=1024k count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 1.38643 seconds, 75.6 MB/s

Going back to async, and tweaking the block size helps:
# dd if=/dev/zero of=./100mb3 bs=32k count=3200
3200+0 records in
3200+0 records out
104857600 bytes (105 MB) copied, 0.358189 seconds, 293 MB/s

[root at xenserver1 remote]# dd if=/dev/zero of=./100mb3 bs=32k count=3200
3200+0 records in
3200+0 records out
104857600 bytes (105 MB) copied, 0.461682 seconds, 227 MB/s

[root at xenserver1 remote]# dd if=/dev/zero of=./100mb3 bs=64k count=3200
3200+0 records in
3200+0 records out
209715200 bytes (210 MB) copied, 3.23562 seconds, 64.8 MB/s

[root at xenserver1 remote]# dd if=/dev/zero of=./100mb3 bs=16k count=3200
3200+0 records in
3200+0 records out
52428800 bytes (52 MB) copied, 0.119123 seconds, 440 MB/s

[root at xenserver1 remote]# dd if=/dev/zero of=./100mb3 bs=8k count=3200
3200+0 records in
3200+0 records out
26214400 bytes (26 MB) copied, 0.069093 seconds, 379 MB/s

It seems I'm getting the best performance from 16k blocks, but I
actually want to tune this for 32k blocks, and neither is really at a
level I'm happy with.

Can anybody offer any suggestions on how to tune NFS and IPoIB to
improve these figures?

thanks,

Ross

PS.  I should mention that I've seen 900MB/s over straight NFS before,
although that was on a smaller test network with just a couple of
10Gb/s SDR Infiniband cards and an 8 port SDR switch.



More information about the ewg mailing list