FW: [ofa-general] NFS-RDMA (OFED1.4) with standard distributions ?

Ciesielski, Frederic (EMEA HPC&OSLO CC) frederic.ciesielski at hp.com
Tue Nov 11 10:06:21 PST 2008


Well, I did not plan to test all the possible versions of the kernel; for sure improvements are on their way, what just confirms the assumption that this 'technology' is not mature yet.

With IPoIB an NFS server can easily export (for instance) up to 1.2GB/s (at least this is what I can measure), with the data in the page cache. No problem up to that point at least.
I clearly understand the theoretical benefits of RDMA and it's a clear improvement over TCP, for MPI. However, the drastic change for MPI is even more on the latency side, though the peak message bandwidth is also improved as one might expect for NFS.
Registration/deregistration issues are also well-known to the MPI developpers, and all this is certainly not that easy to manage in other areas.

Still, NFS-RDMA remains NFS. If the bottleneck is not in the transport, nothing will be improved by RDMA from the performance point of view.
Even worse, what I saw with the 2.6.27 kernel + OFED1.4-rc3 is the inability of NFS-RDMA to match the performance of NFS-TCP for some patterns of IOzone, with a filesystem able to sustain itself several hundreds of MB/s (using exactly the same hardware and software in both cases). We are far from a pure IB bandwidth issue here, we are just facing an issue in how the requests are handled probably, perhaps when paging occurs, I can't tell.
I could not find any tuning to solve the more obvious problem, i.e. the low bandwidth for reading, except mounting with '-o rsize=4096'; probably not what people expect, as this will have other effects. Anyway this does improve only the sequential read bandwidth.
But of course I will repeat my tests with the latest release of everything when I have time, still making sure I compare apples to apples...
Again, I'm sure improvements are on their way !

Fred.


-----Original Message-----
From: Talpey, Thomas [mailto:Thomas.Talpey at netapp.com]
Sent: Tuesday, 11 November, 2008 17:02
To: Ciesielski, Frederic (EMEA HPC&OSLO CC)
Cc: Jeff Becker; general at lists.openfabrics.org
Subject: RE: [ofa-general] NFS-RDMA (OFED1.4) with standard distributions ?

At 11:27 AM 11/10/2008, Ciesielski, Frederic (EMEA HPC&OSLO CC) wrote:
>That's great, thanks.
>
>I ran some tests with the 2.6.27 kernel as server and client, and
>basically it works fine.
>
>I could not find yet any situation where NFS-RDMA would outperform
>NFS/IPoIB, at least when you compare apples to apples (same clients,
>same server, same protocol, and not just write to/read from the
>caches), and it even seems to have severe performance issues for
>reading with files larger than the memory size of the client and the server.
>Hopefully this will improve when more users will be able to give
>valuable feedback...

I have a couple of questions, and perhaps suggestions as well.
First the questions...

- Have you tried with a 2.6.28-rc4 client and server at all? There are a number of significant NFS/RDMA improvements queued in kernel.org, especially around RDMA memory registration as well as RDMA operation scheduling. We've seen some significant throughput improvement even for basic tunings.

- What type of storage are you using at the server, and have you attempted to tune the server at all? For example, if you are storage
(spindle) limited, no network tuning is likely to help and you should address that first. Also, there are tunings such as nfsd thread count, export options, and adapter choice that can make a large difference.

Bottom line, you should be able to reach multi-hundred-MB/sec of read/write throughput with NFS/RDMA, but there may be issues on specific systems, or perhaps with the OFED1.4 code, that need to be accounted for. If possible, you may want to set expectations based on mainline, then try to duplicate them in the OFED backport.
The current OFED NFS/RDMA support is still evolving, while we consider the mainline kernel.org version to be rather solid.

Tom.

>
>Fred.
>
>-----Original Message-----
>From: Jeff Becker [mailto:Jeffrey.C.Becker at nasa.gov]
>Sent: Saturday, 08 November, 2008 22:35
>To: Ciesielski, Frederic (EMEA HPC&OSLO CC)
>Cc: general at lists.openfabrics.org
>Subject: Re: [ofa-general] NFS-RDMA (OFED1.4) with standard distributions ?
>
>Ciesielski, Frederic (EMEA HPC&OSLO CC) wrote:
>> Is there any chance that the new NFS-RDMA features coming with OFED
>> 1.4 work with standard and current distributions, like RHEL5, SLES10 ?
>Not yet, but I'm working on it. I intend for NFSRDMA to work on 2.6.27
>and 2.6.26 for OFED 1.4. The RHEL5 and SLES10 backports will likely be
>done for OFED 1.4.1. Thanks.
>
>-jeff
>
>> Did anybody test this, or would pretend it is supposed to work ?
>>
>> I mean without building a 2.6.27 or equivalent kernel on top of it,
>> keeping almost full support from the vendors.
>>
>> Enhanced kernel modules may not be sufficient to work around the
>> limitations of old kernels...
>>
>>
>>




More information about the general mailing list