[ofa-general] Re: [PATCH] mmu notifiers #v8

Nick Piggin npiggin at suse.de
Mon Mar 3 08:59:10 PST 2008


On Mon, Mar 03, 2008 at 09:18:59AM -0600, Jack Steiner wrote:
> On Mon, Mar 03, 2008 at 02:10:17PM +0100, Nick Piggin wrote:
> > On Mon, Mar 03, 2008 at 01:51:53PM +0100, Andrea Arcangeli wrote:
> > > On Mon, Mar 03, 2008 at 04:29:34AM +0100, Nick Piggin wrote:
> > > > to something I prefer. Others may not, but I'll post them for debate
> > > > anyway.
> > > 
> > > Sure, thanks!
> > > 
> > > > > I didn't drop invalidate_page, because invalidate_range_begin/end
> > > > > would be slower for usages like KVM/GRU (we don't need a begin/end
> > > > > there because where invalidate_page is called, the VM holds a
> > > > > reference on the page). do_wp_page should also use invalidate_page
> > > > > since it can free the page after dropping the PT lock without losing
> > > > > any performance (that's not true for the places where invalidate_range
> > > > > is called).
> > > > 
> > > > I'm still not completely happy with this. I had a very quick look
> > > > at the GRU driver, but I don't see why it can't be implemented
> > > > more like the regular TLB model, and have TLB insertions depend on
> > > > the linux pte, and do invalidates _after_ restricting permissions
> > > > to the pte.
> > > > 
> > > > Ie. I'd still like to get rid of invalidate_range_begin, and get
> > > > rid of invalidate calls from places where permissions are relaxed.
> > > 
> > > _begin exists because by the time _end is called, the VM already
> > > dropped the reference on the page. This way we can do a single
> > > invalidate no matter how large the range is. I don't see ways to
> > > remove _begin while still invoking _end a single time for the whole
> > > range.
> > 
> > Is this just a GRU problem? Can't we just require them to take a ref
> > on the page (IIRC Jack said GRU could be changed to more like a TLB
> > model).
> 
> Maintaining a long-term reference on a page is a problem. The GRU does not
> currently maintain tables to track the pages for which dropins have been done.
> 
> The GRU has a large internal TLB and is designed to reference up to 8PB of
> memory. The size of the tables to track this many referenced pages would be
> a problem (at best).

Is it any worse a problem than the pagetables of the processes which have
their virtual memory exported to GRU? AFAIKS, no; it is on the same
magnitude of difficulty. So you could do it without introducing any
fundamental problem (memory usage might be increased by some constant
factor, but I think we can cope with that in order to make the core patch
really nice and simple).

It is going to be really easy to add more weird and wonderful notifiers
later that deviate from our standard TLB model. It would be much harder to
remove them. So I really want to see everyone conform to this model first.
Numbers and comparisons can be brought out afterwards if people want to
attempt to make such changes.



More information about the general mailing list