[openib-general] mthca FMR correctness (and memory windows)

Thu Mar 23 09:42:14 PST 2006

At 05:10 PM 3/20/2006, Fabian Tillier wrote:
>On 3/20/06, Talpey, Thomas <Thomas.Talpey at netapp.com> wrote:
> > Ok, this is a longer answer.
> >
> > At 06:08 PM 3/20/2006, Fabian Tillier wrote:
> > >As to using FMRs to create virtually contiguous regions, the last data
> > >I saw about this related to SRP (not on OpenIB), and resulted in a
> > >gain of ~25% in throughput when using FMRs vs the "full frontal" DMA
> > >MR.  So there is definitely something to be gained by creating
> > >virutally contiguous regions, especially if you're doing a lot of RDMA
> > >reads for which there's a fairly low limit to how many can be in
> > >flight (4 comes to mind).
> >
> > 25% throughput over what workload? And I assume, this was with the
> > "lazy deregistration" method implemented with the current fmr pool?
> > What was your analysis of the reason for the improvement - if it was
> > merely reducing the op count on the wire, I think your issue lies 
> elsewhere.
>
>This was a large block "read" workload (since HDDs typically give
>better read performance than write).  It was with lazy deregistration,
>and the analysis was that the reduction of the op count on the wire
>was the reason.  It may well have to do with how the target chose to
>respond, though, and I have no idea how that side of things was
>implemented.  It could well be that performance could be improved
>without going with FMRs.

Quite often performance is governed by the target more than the initiator 
as it is in turn governed by its local cache and disc mech performance / 
capacity.  Large data movements typically are a low op count from the 
initiator perspective therefore it seems a bit odd to state that 
performance can be dramatically impacted by the op count on the wire.

> > Also, see previous paragraph - if your SRP is fast but not safe, then only
> > fast but not safe applications will want to use it. Fibre channel adapters
> > do not introduce this vulnerability, but they go fast. I can show you NFS
> > running this fast too, by the way.
>
>Why can't Fibre Channel adapters, or any locally attached hardware for
>that matter, DMA anywhere in memory?  Unless the chipset somehow
>protect against it, doesn't locally attached hardware have free reign
>over DMA?

As a general practice, future volume I/O chipsets across multiple market 
segments will implement an IOMMU to restrict where DMA is allowed.  Both 
AMD and Intel have recently announced specifications to this effect which 
reflect what has been implemented in many non-x86 chipset 
offerings.  Whether a given OS always requires this protection to be 
enabled is implementation-specific but it is something that many within the 
industry and customer base require.

Mike

>Also, please don't take my anectdotal benchmark results as an
>endorsement of the Mellanox FMR design - the data was presented to me
>by Mellanox as a reason to add FMR support to the Windows stack (which
>currently uses the "full frontal" approach due to limitations of the
>verbs API and how it needs to be used for storage).  I never had a
>chance to look into why the gains where so large, and it could be
>either the SRP target implementation, a hardware limitation, or a
>number of other issues, especially since a read workload results in
>RDMA Writes from the target to the host which can be pipelined much
>deeper than RDMA Reads.
>
>- Fab
>_______________________________________________
>openib-general mailing list
>openib-general at openib.org
>http://openib.org/mailman/listinfo/openib-general
>
>To unsubscribe, please visit 
>http://openib.org/mailman/listinfo/openib-general
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060323/b804afe2/attachment.html>