FW: [ofa-general] NFS-RDMA (OFED1.4) with standard distributions ?

Talpey, Thomas Thomas.Talpey at netapp.com
Tue Nov 11 10:57:04 PST 2008


At 01:06 PM 11/11/2008, Ciesielski, Frederic (EMEA HPC&OSLO CC) wrote:
>Well, I did not plan to test all the possible versions of the kernel; 
>for sure improvements are on their way, what just confirms the 
>assumption that this 'technology' is not mature yet.

First, let's be sure to separate NFS/RDMA OFED issues from core NFS/RDMA.
The OFED1.4 release is the first to support NFS/RDMA, and there are
certainly issues remaining in this new backport. Depending on which kernel
you're targeting, there can be other issues - SLES10 is 2.6.16-based, for
example, and RHEL5 is 2.6.18. The NFS code itself (not just NFS/RDMA) has
evolved significantly since then, and continues to do so.

>With IPoIB an NFS server can easily export (for instance) up to 
>1.2GB/s (at least this is what I can measure), with the data in the 
>page cache. No problem up to that point at least.

This is impressive, by the way. I have not seen any results with NFS/IPoIB
at this level. Most client machines run out of CPU far before this.

>I clearly understand the theoretical benefits of RDMA and it's a clear 
>improvement over TCP, for MPI. However, the drastic change for MPI is 
>even more on the latency side, though the peak message bandwidth is 
>also improved as one might expect for NFS.
>Registration/deregistration issues are also well-known to the MPI 
>developpers, and all this is certainly not that easy to manage in other areas.
>
>Still, NFS-RDMA remains NFS. If the bottleneck is not in the 
>transport, nothing will be improved by RDMA from the performance point of view.
>Even worse, what I saw with the 2.6.27 kernel + OFED1.4-rc3 is the 
>inability of NFS-RDMA to match the performance of NFS-TCP for some 
>patterns of IOzone, with a filesystem able to sustain itself several 
>hundreds of MB/s (using exactly the same hardware and software in both 
>cases). We are far from a pure IB bandwidth issue here, we are just 
>facing an issue in how the requests are handled probably, perhaps when 
>paging occurs, I can't tell.

I'd be very interested in any analysis of this which you may have done. One
thought that comes to mind is the possibility that your server's filesystem
performs less well at the 32KB read/write sizes that the NFS/RDMA client is
currently limited to. If you were measuring large-sequential workloads, then
you might be able to measure a difference, particularly when exporting the
filesystem in the default "sync" mode. NFS/TCP can send up to 1MB writes.
This is something we plan to address now that the FRMR memory registration
mode is available.

>I could not find any tuning to solve the more obvious problem, i.e. 
>the low bandwidth for reading, except mounting with '-o rsize=4096'; 

Ouch! That will severely limit the client, forcing it to send MANY more RPC
requests. Did performance increase with this setting? For iozone with what
options?

>probably not what people expect, as this will have other effects. 
>Anyway this does improve only the sequential read bandwidth.
>But of course I will repeat my tests with the latest release of 
>everything when I have time, still making sure I compare apples to apples...
>Again, I'm sure improvements are on their way !

I would look forward to seeing your opinions of the new code, particularly for
the server performance. Thanks for the info so far!

Tom.



>
>Fred.
>
>
>-----Original Message-----
>From: Talpey, Thomas [mailto:Thomas.Talpey at netapp.com]
>Sent: Tuesday, 11 November, 2008 17:02
>To: Ciesielski, Frederic (EMEA HPC&OSLO CC)
>Cc: Jeff Becker; general at lists.openfabrics.org
>Subject: RE: [ofa-general] NFS-RDMA (OFED1.4) with standard distributions ?
>
>At 11:27 AM 11/10/2008, Ciesielski, Frederic (EMEA HPC&OSLO CC) wrote:
>>That's great, thanks.
>>
>>I ran some tests with the 2.6.27 kernel as server and client, and
>>basically it works fine.
>>
>>I could not find yet any situation where NFS-RDMA would outperform
>>NFS/IPoIB, at least when you compare apples to apples (same clients,
>>same server, same protocol, and not just write to/read from the
>>caches), and it even seems to have severe performance issues for
>>reading with files larger than the memory size of the client and the server.
>>Hopefully this will improve when more users will be able to give
>>valuable feedback...
>
>I have a couple of questions, and perhaps suggestions as well.
>First the questions...
>
>- Have you tried with a 2.6.28-rc4 client and server at all? There are 
>a number of significant NFS/RDMA improvements queued in kernel.org, 
>especially around RDMA memory registration as well as RDMA operation 
>scheduling. We've seen some significant throughput improvement even 
>for basic tunings.
>
>- What type of storage are you using at the server, and have you 
>attempted to tune the server at all? For example, if you are storage
>(spindle) limited, no network tuning is likely to help and you should 
>address that first. Also, there are tunings such as nfsd thread count, 
>export options, and adapter choice that can make a large difference.
>
>Bottom line, you should be able to reach multi-hundred-MB/sec of 
>read/write throughput with NFS/RDMA, but there may be issues on 
>specific systems, or perhaps with the OFED1.4 code, that need to be 
>accounted for. If possible, you may want to set expectations based on 
>mainline, then try to duplicate them in the OFED backport.
>The current OFED NFS/RDMA support is still evolving, while we consider 
>the mainline kernel.org version to be rather solid.
>
>Tom.
>
>>
>>Fred.
>>
>>-----Original Message-----
>>From: Jeff Becker [mailto:Jeffrey.C.Becker at nasa.gov]
>>Sent: Saturday, 08 November, 2008 22:35
>>To: Ciesielski, Frederic (EMEA HPC&OSLO CC)
>>Cc: general at lists.openfabrics.org
>>Subject: Re: [ofa-general] NFS-RDMA (OFED1.4) with standard distributions ?
>>
>>Ciesielski, Frederic (EMEA HPC&OSLO CC) wrote:
>>> Is there any chance that the new NFS-RDMA features coming with OFED
>>> 1.4 work with standard and current distributions, like RHEL5, SLES10 ?
>>Not yet, but I'm working on it. I intend for NFSRDMA to work on 2.6.27
>>and 2.6.26 for OFED 1.4. The RHEL5 and SLES10 backports will likely be
>>done for OFED 1.4.1. Thanks.
>>
>>-jeff
>>
>>> Did anybody test this, or would pretend it is supposed to work ?
>>>
>>> I mean without building a 2.6.27 or equivalent kernel on top of it,
>>> keeping almost full support from the vendors.
>>>
>>> Enhanced kernel modules may not be sufficient to work around the
>>> limitations of old kernels...
>>>
>>>
>>>




More information about the general mailing list