[ofa-general] Re: [patch] my mmu notifiers

Andrea Arcangeli andrea at qumranet.com
Tue Feb 19 05:58:51 PST 2008


On Tue, Feb 19, 2008 at 09:43:57AM +0100, Nick Piggin wrote:
> are rather similar. However I have tried to make a point of minimising the
> impact the the core mm/. I don't see why we need to invalidate or flush

I also tried hard to minimise the impact of the core mm/, I also
argued with Christoph that cluttering mm/ wasn't a good idea for
things like age_page that could be a 1 liner change instead of a
multiple-liner change, without any loss of flexibility or readability.

> anything when changing the pte to be _more_ permissive, and I don't

Note that in my patch the invalidate_pages in mprotect can be
trivially switched to a mprotect_pages with proper params. This will
prevent page faults completely in the secondary MMU (there will only
be tlb misses after the tlb flush just like for the core linux pte),
and it'll allow all the secondary MMU pte blocks (512/1024 at time
with my PT lock design) to be updated to have proper permissions
matching the core linux pte.

> understand the need for invalidate_begin/invalidate_end pairs at all.

The need of the pairs is crystal clear to me: range_begin is needed
for GRU _but_only_if_ range_end is called after releasing the
reference that the VM holds on the page. _begin will flush the GRU tlb
and at the same time it will take a mutex that will block further GRU
tlb-miss-interrupts (no idea how they manange those nightmare locking,
I didn't even try to add more locking to KVM and I get away with the
fact KVM takes the pin on the page itself).

My patch calls invalidate_page/pages before the reference is released
on the page, so GRU will work fine despite lack of
range_begin. Furthermore with my patch GRU will be auto-serialized by
the PT lock w/o the need of any additional locking.

> What I have done is basically create it so that the notifiers get called
> basically in the same place as the normal TLB flushing is done, and nowhere
> else.

That was one of my objectives too.

> I also wanted to avoid calling notifier code from inside eg. hardware TLB
> or pte manipulation primitives. These things are already pretty well
> spaghetti, so I'd like to just place them right where needed first... I
> think eventually it will need a bit of a rethink to make it more consistent
> and more general. But I prefer to do put them in the caller for the moment.

Your patch should also work for KVM but it's suboptimal, my patch can
be orders of magnitude more efficient for GRU thanks to the
invalidate_pages optimization. Christoph complained about having to
call one method per pte.

And adding invalidate_range is useless unless you fully support
xpmem. You're calling invalidate_range in places that can't sleep...

No idea why xpmem needs range_begin, I perfectly understand why GRU
needs _begin with Chrisotph's patch (gru lacks the page pin) but I
dunno why xpmem needs range_begin (xpmem has the page pin so I also
think it could avoid using range_begin). Still to support GRU you need
both to call invalidate_range in places that can sleep and you need
the external rmap notifier. The moment you add xpmem into the equation
your and my clean patches become Christoph's one...



More information about the general mailing list