[ofa-general] Re: [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges

Andrea Arcangeli andrea at qumranet.com
Tue Feb 19 17:00:38 PST 2008


On Wed, Feb 20, 2008 at 10:08:49AM +1100, Nick Piggin wrote:
> You can't sleep inside rcu_read_lock()!
> 
> I must say that for a patch that is up to v8 or whatever and is
> posted twice a week to such a big cc list, it is kind of slack to
> not even test it and expect other people to review it.

Well, xpmem requirements are complex. As as side effect of the
simplicity of my approach, my patch is 100% safe since #v1. Now it
also works for GRU and it cluster invalidates.

> Also, what we are going to need here are not skeleton drivers
> that just do all the *easy* bits (of registering their callbacks),
> but actual fully working examples that do everything that any
> real driver will need to do. If not for the sanity of the driver

I've a fully working scenario for my patch, infact I didn't post the
mmu notifier patch until I got KVM to swap 100% reliably to be sure I
would post something that works well. mmu notifiers are already used
in KVM for:

1) 100% reliable and efficient swapping of guest physical memory
2) copy-on-writes of writeprotect faults after ksm page sharing of guest
   physical memory
3) ballooning using madvise to give the guest memory back to the host

My implementation is the most handy because it requires zero changes
to the ksm code too (no explicit mmu notifier calls after
ptep_clear_flush) and it's also 100% safe (no mess with schedules over
rcu_read_lock), no "atomic" parameters, and it doesn't open a window
where sptes have a view on older pages and linux pte has view on newer
pages (this can happen with remap_file_pages with my KVM swapping
patch to use V8 Christoph's patch).

> Also, how to you resolve the case where you are not allowed to sleep?
> I would have thought either you have to handle it, in which case nobody
> needs to sleep; or you can't handle it, in which case the code is
> broken.

I also asked exactly this, glad you reasked this too.



More information about the general mailing list