[ofa-general] Re: [PATCH] KVM swapping with MMU Notifiers V7

Andrea Arcangeli andrea at qumranet.com
Mon Feb 18 04:17:15 PST 2008


On Sat, Feb 16, 2008 at 03:08:17AM -0800, Andrew Morton wrote:
> On Sat, 16 Feb 2008 11:48:27 +0100 Andrea Arcangeli <andrea at qumranet.com> wrote:
> 
> > +void kvm_mmu_notifier_invalidate_range_end(struct mmu_notifier *mn,
> > +					   struct mm_struct *mm,
> > +					   unsigned long start, unsigned long end,
> > +					   int lock)
> > +{
> > +	for (; start < end; start += PAGE_SIZE)
> > +		kvm_mmu_notifier_invalidate_page(mn, mm, start);
> > +}
> > +
> > +static const struct mmu_notifier_ops kvm_mmu_notifier_ops = {
> > +	.invalidate_page	= kvm_mmu_notifier_invalidate_page,
> > +	.age_page		= kvm_mmu_notifier_age_page,
> > +	.invalidate_range_end	= kvm_mmu_notifier_invalidate_range_end,
> > +};
> 
> So this doesn't implement ->invalidate_range_start().

Correct. range_start is needed by subsystems that don't pin the pages
(so they've to drop the secondary mmu mappings on the physical page
before the page is released by the linux VM).

> By what means does it prevent new mappings from being established in the
> range after core mm has tried to call ->invalidate_rande_start()?
> mmap_sem, I assume?

No, populate range only takes the mmap_sem in read mode and the kvm page
fault also is of course taking it only in read mode.

What makes it safe, is that invalidate_range_end is called _after_ the
linux pte is clear. The kvm page fault, if it triggers, it will call
into get_user_pages again to re-establish the linux pte _before_
establishing the spte.

It's the same reason why it's safe to flush the tlb after clearing the
linux pte. sptes are like a secondary tlb.

> > +			/* set userspace_addr atomically for kvm_hva_to_rmapp */
> > +			spin_lock(&kvm->mmu_lock);
> > +			memslot->userspace_addr = userspace_addr;
> > +			spin_unlock(&kvm->mmu_lock);
> 
> are you sure?  kvm_unmap_hva() and kvm_age_hva() read ->userspace_addr a
> single time and it doesn't immediately look like there's a need to take the
> lock here?

gcc will always write it with a movq but this is to be
C-specs-compliant and because this is by far not a performance
critical path I thought it was simpler than some other atomic move in
a single insn.



More information about the general mailing list