[ewg] Infiniband and NFS speed tuning - any ideas?
Jon Mason
jon at opengridcomputing.com
Tue Dec 15 08:21:23 PST 2009
Are you running NFS RDMA or NFS TCP? Have you tweaked the read/write
size mount properties?
Its a little stale, but you might want to read:
http://nfs.sourceforge.net/nfs-howto/ar01s05.html
On Tue, Dec 15, 2009 at 10:11:58AM +0000, Ross Smith wrote:
> Hi folks,
>
> Can anybody give me advice on how to tune NFS to improve performance
> on this system? I've got a pair of 40Gb/s QDR ConnectX cards attached
> to a 20Gb/s DDR switch. Infiniband diagnostics show that I can
> consistently achieve a bandwidth of 1866MB/s, but the best I've gotten
> out of NFS in testing is 440MB/s and in actual use I'm hitting nearer
> 290MB/s.
>
> To test performance I'm creating a ramdisk, mounting it over NFS and
> doing a simple write of a 100MB file:
>
> The full setup is:
>
> NFS Server: 192.168.2.5
> NFS Client: 192.168.2.2
>
> On server:
> # mkdir ramdisk
> # mount -t ramfs -o size=512m ramfs ./ramdisk
> # chmod 777 ramdisk
> # /usr/sbin/exportfs -o rw,insecure,async,fsid=0 :/home/ross/ramdisk
> # /sbin/service nfs start
> # ./fw.stop
>
> On client:
> # opensm -B
> # mkdir remote
> # ./fw.stop
> # mount 192.168.2.5:/home/ross/ramdisk ./remote
>
> The script I'm using to temporarily disable the firewall is:
> # cat fw.stop
> echo "stopping firewall"
> iptables -F
> iptables -X
> iptables -t nat -F
> iptables -t nat -X
> iptables -t mangle -F
> iptables -t mangle -X
> iptables -P INPUT ACCEPT
> iptables -P FORWARD ACCEPT
> iptables -P OUTPUT ACCEPT
>
>
> The bandwidth test:
> # ib_send_bw 192.168.2.5
> ------------------------------------------------------------------
> Send BW Test
> Connection type : RC
> Inline data is used up to 1 bytes message
> local address: LID 0x01, QPN 0x10004b, PSN 0x5812a
> remote address: LID 0x04, QPN 0x80049, PSN 0x7e71c2
> Mtu : 2048
> ------------------------------------------------------------------
> #bytes #iterations BW peak[MB/sec] BW average[MB/sec]
> 65536 1000 1866.57 1866.22
> ------------------------------------------------------------------
>
>
> Ramdisk speed test results (before exporting the folder):
> dd if=/dev/zero of=./100mb bs=1024k count=100
> 100+0 records in
> 100+0 records out
> 104857600 bytes (105 MB) copied, 0.049329 seconds, 2.1 GB
>
> And ramdisk results after exporting the folder (I'm not sure why this
> should be so much slower, but this appears consistently reproducible)
> # dd if=/dev/zero of=./100mb bs=1024k count=200
> 200+0 records in
> 200+0 records out
> 209715200 bytes (210 MB) copied, 0.235899 seconds, 889 MB/s
>
>
> I've checked that the client can cope with the speeds too, creating a
> ramdisk there for testing:
> # dd if=/dev/zero of=./100mb bs=1024k count=100
> 100+0 records in
> 100+0 records out
> 104857600 bytes (105 MB) copied, 0.060491 seconds, 1.7 GB/s
>
>
> So I have an interconnect that can push 1.8GB/s, a server that can do
> 2.1GB/s, and a client that can cope with 1.7GB/s. I'm aiming for
> 900MB/s+ over NFS, and in theory I have the infrastructure to cope
> with that.
>
> However, NFS speed test results are about a third of the level I'm
> after, no matter how I try to tweak the settings:
>
> dd if=/dev/zero of=./100mb bs=1024k count=100100+0 records in
> 100+0 records out
> 104857600 bytes (105 MB) copied, 0.313448 seconds, 335 MB/s
>
> Sync NFS results are truly horrible (even though this is to a ramdisk):
> # dd if=/dev/zero of=./100mb bs=1024k count=100
> 100+0 records in
> 100+0 records out
> 104857600 bytes (105 MB) copied, 4.84575 seconds, 21.6 MB/s
>
> # dd if=/dev/zero of=./100mb2 bs=1024k count=100
> 100+0 records in
> 100+0 records out
> 104857600 bytes (105 MB) copied, 1.38643 seconds, 75.6 MB/s
>
> Going back to async, and tweaking the block size helps:
> # dd if=/dev/zero of=./100mb3 bs=32k count=3200
> 3200+0 records in
> 3200+0 records out
> 104857600 bytes (105 MB) copied, 0.358189 seconds, 293 MB/s
>
> [root at xenserver1 remote]# dd if=/dev/zero of=./100mb3 bs=32k count=3200
> 3200+0 records in
> 3200+0 records out
> 104857600 bytes (105 MB) copied, 0.461682 seconds, 227 MB/s
>
> [root at xenserver1 remote]# dd if=/dev/zero of=./100mb3 bs=64k count=3200
> 3200+0 records in
> 3200+0 records out
> 209715200 bytes (210 MB) copied, 3.23562 seconds, 64.8 MB/s
>
> [root at xenserver1 remote]# dd if=/dev/zero of=./100mb3 bs=16k count=3200
> 3200+0 records in
> 3200+0 records out
> 52428800 bytes (52 MB) copied, 0.119123 seconds, 440 MB/s
>
> [root at xenserver1 remote]# dd if=/dev/zero of=./100mb3 bs=8k count=3200
> 3200+0 records in
> 3200+0 records out
> 26214400 bytes (26 MB) copied, 0.069093 seconds, 379 MB/s
>
> It seems I'm getting the best performance from 16k blocks, but I
> actually want to tune this for 32k blocks, and neither is really at a
> level I'm happy with.
>
> Can anybody offer any suggestions on how to tune NFS and IPoIB to
> improve these figures?
>
> thanks,
>
> Ross
>
> PS. I should mention that I've seen 900MB/s over straight NFS before,
> although that was on a smaller test network with just a couple of
> 10Gb/s SDR Infiniband cards and an 8 port SDR switch.
> _______________________________________________
> ewg mailing list
> ewg at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
More information about the ewg
mailing list