[ewg] Infiniband and NFS speed tuning - any ideas?

Jon Mason jon at opengridcomputing.com
Tue Dec 15 08:21:23 PST 2009


Are you running NFS RDMA or NFS TCP?  Have you tweaked the read/write
size mount properties?

Its a little stale, but you might want to read:
http://nfs.sourceforge.net/nfs-howto/ar01s05.html

On Tue, Dec 15, 2009 at 10:11:58AM +0000, Ross Smith wrote:
> Hi folks,
> 
> Can anybody give me advice on how to tune NFS to improve performance
> on this system?  I've got a pair of 40Gb/s QDR ConnectX cards attached
> to a 20Gb/s DDR switch.  Infiniband diagnostics show that I can
> consistently achieve a bandwidth of 1866MB/s, but the best I've gotten
> out of NFS in testing is 440MB/s and in actual use I'm hitting nearer
> 290MB/s.
> 
> To test performance I'm creating a ramdisk, mounting it over NFS and
> doing a simple write of a 100MB file:
> 
> The full setup is:
> 
> NFS Server:  192.168.2.5
> NFS Client:  192.168.2.2
> 
> On server:
> # mkdir ramdisk
> # mount -t ramfs -o size=512m ramfs ./ramdisk
> # chmod 777 ramdisk
> # /usr/sbin/exportfs -o rw,insecure,async,fsid=0 :/home/ross/ramdisk
> # /sbin/service nfs start
> # ./fw.stop
> 
> On client:
> # opensm -B
> # mkdir remote
> # ./fw.stop
> # mount 192.168.2.5:/home/ross/ramdisk ./remote
> 
> The script I'm using to temporarily disable the firewall is:
> # cat fw.stop
> echo "stopping firewall"
> iptables -F
> iptables -X
> iptables -t nat -F
> iptables -t nat -X
> iptables -t mangle -F
> iptables -t mangle -X
> iptables -P INPUT ACCEPT
> iptables -P FORWARD ACCEPT
> iptables -P OUTPUT ACCEPT
> 
> 
> The bandwidth test:
> # ib_send_bw 192.168.2.5
> ------------------------------------------------------------------
>                    Send BW Test
> Connection type : RC
> Inline data is used up to 1 bytes message
>  local address:  LID 0x01, QPN 0x10004b, PSN 0x5812a
>  remote address: LID 0x04, QPN 0x80049, PSN 0x7e71c2
> Mtu : 2048
> ------------------------------------------------------------------
>  #bytes #iterations    BW peak[MB/sec]    BW average[MB/sec]
>  65536        1000            1866.57               1866.22
> ------------------------------------------------------------------
> 
> 
> Ramdisk speed test results (before exporting the folder):
> dd if=/dev/zero of=./100mb bs=1024k count=100
> 100+0 records in
> 100+0 records out
> 104857600 bytes (105 MB) copied, 0.049329 seconds, 2.1 GB
> 
> And ramdisk results after exporting the folder (I'm not sure why this
> should be so much slower, but this appears consistently reproducible)
> # dd if=/dev/zero of=./100mb bs=1024k count=200
> 200+0 records in
> 200+0 records out
> 209715200 bytes (210 MB) copied, 0.235899 seconds, 889 MB/s
> 
> 
> I've checked that the client can cope with the speeds too, creating a
> ramdisk there for testing:
> # dd if=/dev/zero of=./100mb bs=1024k count=100
> 100+0 records in
> 100+0 records out
> 104857600 bytes (105 MB) copied, 0.060491 seconds, 1.7 GB/s
> 
> 
> So I have an interconnect that can push 1.8GB/s, a server that can do
> 2.1GB/s, and a client that can cope with 1.7GB/s.  I'm aiming for
> 900MB/s+ over NFS, and in theory I have the infrastructure to cope
> with that.
> 
> However, NFS speed test results are about a third of the level I'm
> after, no matter how I try to tweak the settings:
> 
> dd if=/dev/zero of=./100mb bs=1024k count=100100+0 records in
> 100+0 records out
> 104857600 bytes (105 MB) copied, 0.313448 seconds, 335 MB/s
> 
> Sync NFS results are truly horrible (even though this is to a ramdisk):
> # dd if=/dev/zero of=./100mb bs=1024k count=100
> 100+0 records in
> 100+0 records out
> 104857600 bytes (105 MB) copied, 4.84575 seconds, 21.6 MB/s
> 
> # dd if=/dev/zero of=./100mb2 bs=1024k count=100
> 100+0 records in
> 100+0 records out
> 104857600 bytes (105 MB) copied, 1.38643 seconds, 75.6 MB/s
> 
> Going back to async, and tweaking the block size helps:
> # dd if=/dev/zero of=./100mb3 bs=32k count=3200
> 3200+0 records in
> 3200+0 records out
> 104857600 bytes (105 MB) copied, 0.358189 seconds, 293 MB/s
> 
> [root at xenserver1 remote]# dd if=/dev/zero of=./100mb3 bs=32k count=3200
> 3200+0 records in
> 3200+0 records out
> 104857600 bytes (105 MB) copied, 0.461682 seconds, 227 MB/s
> 
> [root at xenserver1 remote]# dd if=/dev/zero of=./100mb3 bs=64k count=3200
> 3200+0 records in
> 3200+0 records out
> 209715200 bytes (210 MB) copied, 3.23562 seconds, 64.8 MB/s
> 
> [root at xenserver1 remote]# dd if=/dev/zero of=./100mb3 bs=16k count=3200
> 3200+0 records in
> 3200+0 records out
> 52428800 bytes (52 MB) copied, 0.119123 seconds, 440 MB/s
> 
> [root at xenserver1 remote]# dd if=/dev/zero of=./100mb3 bs=8k count=3200
> 3200+0 records in
> 3200+0 records out
> 26214400 bytes (26 MB) copied, 0.069093 seconds, 379 MB/s
> 
> It seems I'm getting the best performance from 16k blocks, but I
> actually want to tune this for 32k blocks, and neither is really at a
> level I'm happy with.
> 
> Can anybody offer any suggestions on how to tune NFS and IPoIB to
> improve these figures?
> 
> thanks,
> 
> Ross
> 
> PS.  I should mention that I've seen 900MB/s over straight NFS before,
> although that was on a smaller test network with just a couple of
> 10Gb/s SDR Infiniband cards and an 8 port SDR switch.
> _______________________________________________
> ewg mailing list
> ewg at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg



More information about the ewg mailing list