[Openib-windows] Re: [openib-general] NFS performance and general disk network export advice (Linux-Windows)

Talpey, Thomas Thomas.Talpey at netapp.com
Thu Feb 9 13:14:56 PST 2006


At 03:17 PM 2/9/2006, Paul Baxter wrote:
>I'm looking to export a filesystem from each of four linux 64bit boxes to a 
>single Windows server 2003 64bit Ed.
>
>Has anyone achieved this already using an IB transport? Can I use NFS over 
>IPoIB cross platform? i.e. do both ends support a solution?
>
>Is NFS over RDMA compatible with Windows (pretty sure the answer is no to 
>this one but love to be proven wrong). I've attached Tom's announcement of 
>the latest to the bottom of this email. I don't think Windows has the RDMA 
>abstraction (yet)?

Not the code I posted! :-) But sure, it's possible to implement NFS/RDMA
on Windows. Let us know when you're ready to test. ;-)

>Are windows IB drivers (Openib or Mellanox) compatible with these options? 
>Do I layer Windows services for Unix on top of the Windows IB drivers and 
>IPoIB to achieve a cross platform NFS?

You could do this but your real challenge is the upper layer IFS interface.
You would need to implement a Windows filesystem for NFS first. Of course,
there are such beasts, Hummingbird's comes to mind.

The code I posted uses strictly the OpenIB RDMA interfaces, plus CMA for
address resolution and making connections. By the way, it will work over
iWARP too.

>Has anyone done much in the way of NFS performance comparisons of NFS over 
>IPoIB in cross-platform situations vs say Gigabit ethernet. Does it work :) 
>What is large file throughput and processor loading - I'm aiming for 150-200 
>MB/s on large files on 4x SDR IB (possibly DDR if we can fit the bigger 144 
>port switch chassis into our rack layout for 50-ish nodes).

NFS over IPoIB does work, but is nowhere near as low-overhead as native
NFA over RDMA. There are several issues with an IPoIB implementation,
first of all the fact that an IPoIB solution is quite a bit less optimal than
a native 10GbE NIC:

- The UD connection typically has a single message in flight, which negates
much of the streaming throughput capable with RC.
- The IPoIB layer is an emulation, and does not generally perform the hardware
checksumming and large segment offload that even 100Mb NICs provide.
- The network stack is still in the loop on both ends, adding computational
overhead and latency.
- The data must still be copied.

I have seen native zero-copy zero-touch NFS/RDMA streaming at full PCI/X
throughput using only about 20% of a dual-processor 2GHz Xeon. Typically,
most network stacks top out at 100% CPU at perhaps half this rate on similar
platforms. I'd expect IPoIB to be even less due to the reasons above.

>Are there any alternatives to using NFS that may be better and that would 
>'transparently' receive a performance boost with IB compared with using a 
>simple NFS/gigabit ethernet solution. Must be fairly straightforward, 
>ideally application neutral (configure a drive and load/unload script for 
>Linux and it just happens) and compatible between Win2003 and Linux? 
>Alternatives using perhaps Samba on the Linux side?
>
>My lack of knowledge of IB in the windows world has got me concerned over 
>whether this is actually achievable (easily).
>
>I hope to be trying this once we get a Windows 2003 machine, but hope 
>someone can encourage me that its a breeze prior to my coming unstuck in a 
>month or so!
>
>Some detail about the bit I do understand:
>
>I will be using a patched Linux kernel (realtime preemption patches ) but 
>prefer not to apply/track too many kernel patches as the kernel evolves. The 
>NFS patches suggested by Tom in his announcement below make me a little 
>nervous.

The most important patches for integrating the NFS/RDMA client are already
in the 2.6.15 kernel, but there is additional work which is still in progress.
These are the patches I refer to. One of the major ones is the ability to
dynamically load RPC transports, such as the NFS/RDMA module. So you
do need some sort of patch to use the client, currently.

The transport switch continues to evolve and become integrated into the
kernel, so the need for this particular patch will fall away eventually. FYI,
the transport switch is much more general than NFS/RDMA - it's the
underpinning of IPv6 support for the NFS client.

Your real issue in working with NFS/RDMA in the way you describe is the
availability of the server. The Linux NFS/RDMA server is still very much under
development, and will take time just to be ready for experimentation.
Especially, it will take time to get it to a state where it can perform the
way you require (performance).

Please feel free to contact me offline if you want to talk about details of
actually setting this up. With a stock 2.6.15.2 kernel and a couple of IB
cards you could get it going just to get started.

Tom.




>
>The application will alternate between a real-time mode with (probably) no 
>NFS (or similar network exporting of the disk) and an archiving mode where 
>Linux will load relevant network filesystem modules and let the windows 
>machine read the disks.
>
>The reason for this odd load/unload behaviour is because our current 
>experience with NFS has been that the driver is prone to putting 
>multi-millisecond glitches that have a habit of upsetting (soft) real-time 
>behaviour at the sorts of timing latencies we're looking at (milliseond or 
>two). NFS (and network cards) do like to batch up work and then run these 
>from interrupt contexts. SoftIRQs help tremendously but don't seem to be the 
>complete answer.
>
>Paul Baxter
>
>Tom's announcement:
>> We have released an updated NFS/RDMA client for Linux at
>> the project's Sourceforge site:
>>
>> <http://sourceforge.net/projects/nfs-rdma/>
>>
>> 
><http://sourceforge.net/project/showfiles.php?group_id=97628&package_id=178973>
>>
>> This release updates the RPC/RDMA support as follows:
>> Linux 2.6.15.2 supported
>> Integrates with RPC via 2.6.15 transport switch
>> Employs OpenIB RDMA verbs API (not kDAPL)
>> Dual BSD/GPL2 licensing
>>
>> There are no protocol changes in this release, it is identical to
>> the previous release (and the IETF draft) in this respect. The
>> client has been tested with NFSv3 and passes the Connectathon
>> test suite.
>>
>> At present, the client requires some additional transport switch
>> patches to be applied to the Linux kernel, these are available at
>> Chuck Lever's patches page:
>> <http://troy.citi.umich.edu/~cel/linux-2.6/2.6.15/release-notes.html>
>>
>> The related CITI NFS/RDMA server project is currently available
>> for 2.6.14 from:
>>
>> <http://www.citi.umich.edu/projects/rdma/>
>>
>> 
><http://www.citi.umich.edu/projects/rdma/patches/stage2/2.6.14.3-RPCRD>MA_stage2_2005-12-19.patch>
>>
>> This server is functional but only supports small RDMA inline data
>> transfers, and a single request in flight. So, its performance is quite
>> far from the potential. However, it is functional and is the server
>> we pass Connectathon with!
>>
>> The server project is now being developed by Open Grid Computing,
>> moving to the OpenIB common RDMA verbs API. We'll be making
>> updates to both client and server as they become available. There's
>> a lot more to do.
>>
>> We look forward to comments and feedback from the various standards
>> and open source communities on this. Feel free to use the mailing list
>> on the sourceforge project site, or any of these lists (which we usually
>> monitor) but cc at least me and James Lentini (jlentini at netapp.com).
>>
>> Thanks,
>> Tom Talpey, for the various NFS/RDMA projects.
>
>_______________________________________________
>openib-general mailing list
>openib-general at openib.org
>http://openib.org/mailman/listinfo/openib-general
>
>To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>
>




More information about the ofw mailing list