[ofa-general] Memory registration redux

Wed May 6 14:46:28 PDT 2009

On Wed, May 06, 2009 at 01:10:47PM -0700, Roland Dreier wrote:
> By the way, what's the desired behavior of the cache if a process
> registers, say, address range 0x1000 ... 0x3fff, and then the same
> process registers address range 0x2000 ... 0x2fff (with all the same
> permissions, etc)?
> 
> The initial registration creates an MR that is still valid for the
> smaller virtual address range, so the second registration is much
> cheaper if we used the cached registration; but if we use the cache for
> the second registration, and then deregister the first one, we're stuck
> with a too-big range pinned in the cache because of the second
> registration.

Yuk, doesn't this problem pretty much doom this method entirely? You
can't tear down the entire registration of 0x1000 ... 0x3fff if the app
does something to change 0x2000 .. 0x2fff because it may have active
RDMAs going on in 0x1000 ... 0x1fff.

The above could happen through strange use of brk.

What about a slightly different twist.. Instead of trying to make
everything synchronous in the mmu_notifier, just have a counter mapped
to user space. Increment the counter whenever the mms change from the
notifier. Pin the user page that contains the single counter upon
starting the process so access is lockless.

In user space, check the counter before every cache lookup and if it
has changed call back into the kernel to resynchronize the MR tables in
the HCA to the current VM.

Avoids the locking and racing problems, keeps the fast path in the
user space and avoids the above question about how to deal with
arbitrary actions?

Jason