[openib-general] Re: [hugh at veritas.com: Re: Nick's core remove PageReserved broke vmware...]

Michael S. Tsirkin mst at mellanox.co.il
Thu Nov 3 06:39:15 PST 2005


Hello Geb,
I expect so, unless more fires sping up.
I'll let you know if I need help.

Thanks for the offer,
MST

Quoting  glebn at voltaire.com <glebn at voltaire.com>:
> Subject: [hugh at veritas.com: Re: Nick's core remove PageReserved broke vmware...]
> 
> Hello Michael,
> 
> It seems that it is time to resurrect your DONTCOPY patch. Can you do
> it?
> If you have no time now I can handle it.
> 
> ----- Forwarded message from Hugh Dickins <hugh at veritas.com> -----
> 
> From: Hugh Dickins <hugh at veritas.com>
> To: Gleb Natapov <gleb at minantech.com>
> Cc: Benjamin Herrenschmidt <benh at kernel.crashing.org>,
> 	Petr Vandrovec <vandrove at vc.cvut.cz>,
> 	Nick Piggin <nickpiggin at yahoo.com.au>,
> 	"Michael S. Tsirkin" <mst at mellanox.co.il>,
> 	Badari Pulavarty <pbadari at us.ibm.com>,
> 	Linux Kernel Mailing List <linux-kernel at vger.kernel.org>
> Subject: Re: Nick's core remove PageReserved broke vmware...
> Date: Thu, 3 Nov 2005 14:11:46 +0000 (GMT)
> 
> On Thu, 3 Nov 2005, Gleb Natapov wrote:
> > On Wed, Nov 02, 2005 at 10:02:49PM +0000, Hugh Dickins wrote:
> > > On Thu, 3 Nov 2005, Benjamin Herrenschmidt wrote:
> > > > On Wed, 2005-11-02 at 21:41 +0000, Hugh Dickins wrote:
> > > > 
> > > > > The only extant problem here is if the pages are private, and
> you
> > > > > fork while this is going on, and the parent user process writes
> to the
> > > > > area before completion: then COW leaves the child with the page
> being
> > > > > DMAed into, giving the parent a copied page which may be
> incomplete.
> > > > 
> > > > Won't happen, and if it does, it's a user error to rely on that
> working,
> > > > so it doesn't matter.
> > > 
> > > I wish everyone else would see it that way!  (But some people do
> > > have valid scenarios where it can't just be ruled out completely.)
> > > 
> > I am one of those people :)
> > 
> > Last discussion about this issue ended without resolution, but I
> remember
> > you mentioned the possibility to leave ptes writable in parent during
> fork 
> > for private pages mapped for DMA. Is this approach acceptable?
> 
> I was toying with that idea back then, but it leaves the pages in a
> peculiar limbo between being shared and private, such that it's hard
> to think through the consequences.  We do already have a case rather
> like that (ptrace writing to a write-protected area), but some of us
> are a bit worried by that one, so I'd be foolish now to recommend
> another such subversion of the rules.
> 
> In the time since we discussed before, I've rather come full circle
> round to my original position: abandoning such ideas of trying to
> handle it from get_user_pages itself, appreciating the simplicity
> of the original PROT_DONTCOPY idea from you guys; but sticking to my
> initial reaction that this is better done by madvise(MADV_DONTCOPY),
> not by the mmap/mprotect route in Michael's patch.  (I never bought
> the "racy" argument advanced in favour of the mmap flag.)
> 
> One of the factors which has swayed me to the DONTCOPY approach, is
> Nick's 2.6.14 optimization in fork's copy_page_range, where areas
> which can be safely faulted later are not copied pte by pte.  But
> that doesn't apply to all areas, and in particular cannot apply to
> VM_NONLINEAR shared areas.  It should be of benefit to apps which
> use large such areas, and also do a lot of forking children who don't
> need those areas, to be able to mark them VM_DONTCOPY.  Or any other
> vmas the children won't need.  (But there's one big distinction between
> the optimization and VM_DONTCOPY: the optimization copies vma but
> doesn't fill in its ptes, VM_DONTCOPY doesn't even copy the vma.)
> 
> Two warnings if someone would like to post a MADV_DONTCOPY patch.
> It should include a matching MADV_DOCOPY to clear the condition, but
> that must not be allowed to clear VM_DONTCOPY set originally by driver:
> perhaps you'll end up with a VM_UDONTCOPY or something like that.
> 
> And Badari has a MADV_REMOVE patch in the works, taking the next
> slot (just after MADV_DONTNEED in most of the arches): probably
> best for you to base yours on top of his (though yours is simpler
> and might jump ahead).
> 
> Hugh
> 
> ----- End forwarded message -----
> 
> --
> 			Gleb.
> 

-- 
MST



More information about the general mailing list