[openib-general] mthca FMR correctness (and memory windows)

Talpey, Thomas Thomas.Talpey at netapp.com
Mon Mar 20 16:51:51 PST 2006


Ok, this is a longer answer.

At 06:08 PM 3/20/2006, Fabian Tillier wrote:
>You pre-alloc the MPT entry, but not the MTT entries.  You then
>populate the MTT by doing posted writes to the HCA memory (or host
>memory for memfree HCAs).
>...
>I don't know if allocating MTT entries is really expensive.  What
>costs is the fact that you need to do command interface transactions
>to write the MTT entries, while FMRs support posted writes.

I don't know what MPTs and MTTs are (Mellanox implementation?) nor
do I know exactly what the overhead difference you refer to really is.
It's less about the overhead and more about the resource contention,
in my experience. 

>That is, just like with alloc_fmr, you need to reserve and format an
>MPT for regular memory registrations, which is a command interface
>transaction.  For memory registration, one or more commands precede
>this to write to the MTT. Thus, a memory registration is at a minimum
>a 2 command interface transaction operation, potentially more
>depending on the size of the registration.
>
>Deregistration and freeing (not unmapping) an FMR should be
>equivalent, I would think.

So, in the RPC/RDMA client, I do ib_alloc_fmr() a bunch of times way up
front, when setting up the connection. This provides the "windows" which
are then used to register chunks (RPC/RDMA segments).

As each RPC is placed on the wire, I borrow fmr's from the above list and
call ib_map_phys_fmr() to establish the mapping for each of its segments.
No allocation is performed on this hot path.

When the server replies, I call ib_unmap_fmr() to tear down the mappings.
No deallocation is performed, the fmrs are returned to a per-mount pool,
*after unmaping them*.

I just want the fastest possible map and unmap. I guess that means I
want fast MTT's.

>I'd spoken with Dror about changing the implementation of memory
>registration to always use posted writes, and we'd come to the
>conclusion that this would work, though doing so was not the intended
>usage and thus not something that was garanteed to work going forward.
> One of Dror's main concerns was that a future change in firmware
>could break this.
>
>Such a change would allow memory registration to require only a single
>command interface transaction (and thus only a single wait operation
>while that command completes).  I'd think that was beneficial, but
>haven't had a chance to poke around to quatify the gains.

Again, it's not registration, it's the map/unmap. Do you believe that would
be faster with this interface? I don't think it requires an API change outside
the mthca interface, btw.

>I'd still be interested in seeing regular registration calls improved,
>as it's clear that an application that is sensitive about its security
>must either restrict itself to send/recv, buffer the data (data copy
>overhead), or register/unregister for each I/O.

Trust me, storage is sensitive to its security (and its data integrity).

>As to using FMRs to create virtually contiguous regions, the last data
>I saw about this related to SRP (not on OpenIB), and resulted in a
>gain of ~25% in throughput when using FMRs vs the "full frontal" DMA
>MR.  So there is definitely something to be gained by creating
>virutally contiguous regions, especially if you're doing a lot of RDMA
>reads for which there's a fairly low limit to how many can be in
>flight (4 comes to mind).

25% throughput over what workload? And I assume, this was with the
"lazy deregistration" method implemented with the current fmr pool?
What was your analysis of the reason for the improvement - if it was
merely reducing the op count on the wire, I think your issue lies elsewhere.

Also, see previous paragraph - if your SRP is fast but not safe, then only
fast but not safe applications will want to use it. Fibre channel adapters
do not introduce this vulnerability, but they go fast. I can show you NFS
running this fast too, by the way.

Tom.




More information about the general mailing list