[openib-general] mthca FMR correctness (and memory windows)

Mon Mar 20 17:10:39 PST 2006

On 3/20/06, Talpey, Thomas <Thomas.Talpey at netapp.com> wrote:
> Ok, this is a longer answer.
>
> At 06:08 PM 3/20/2006, Fabian Tillier wrote:
> >As to using FMRs to create virtually contiguous regions, the last data
> >I saw about this related to SRP (not on OpenIB), and resulted in a
> >gain of ~25% in throughput when using FMRs vs the "full frontal" DMA
> >MR.  So there is definitely something to be gained by creating
> >virutally contiguous regions, especially if you're doing a lot of RDMA
> >reads for which there's a fairly low limit to how many can be in
> >flight (4 comes to mind).
>
> 25% throughput over what workload? And I assume, this was with the
> "lazy deregistration" method implemented with the current fmr pool?
> What was your analysis of the reason for the improvement - if it was
> merely reducing the op count on the wire, I think your issue lies elsewhere.

This was a large block "read" workload (since HDDs typically give
better read performance than write).  It was with lazy deregistration,
and the analysis was that the reduction of the op count on the wire
was the reason.  It may well have to do with how the target chose to
respond, though, and I have no idea how that side of things was
implemented.  It could well be that performance could be improved
without going with FMRs.

> Also, see previous paragraph - if your SRP is fast but not safe, then only
> fast but not safe applications will want to use it. Fibre channel adapters
> do not introduce this vulnerability, but they go fast. I can show you NFS
> running this fast too, by the way.

Why can't Fibre Channel adapters, or any locally attached hardware for
that matter, DMA anywhere in memory?  Unless the chipset somehow
protect against it, doesn't locally attached hardware have free reign
over DMA?

Also, please don't take my anectdotal benchmark results as an
endorsement of the Mellanox FMR design - the data was presented to me
by Mellanox as a reason to add FMR support to the Windows stack (which
currently uses the "full frontal" approach due to limitations of the
verbs API and how it needs to be used for storage).  I never had a
chance to look into why the gains where so large, and it could be
either the SRP target implementation, a hardware limitation, or a
number of other issues, especially since a read workload results in
RDMA Writes from the target to the host which can be pipelined much
deeper than RDMA Reads.

- Fab