[openib-general] Re: [hugh at veritas.com: Re: Nick's core remove PageReserved broke vmware...]
Michael S. Tsirkin
mst at mellanox.co.il
Thu Nov 3 06:39:15 PST 2005
Hello Geb,
I expect so, unless more fires sping up.
I'll let you know if I need help.
Thanks for the offer,
MST
Quoting glebn at voltaire.com <glebn at voltaire.com>:
> Subject: [hugh at veritas.com: Re: Nick's core remove PageReserved broke vmware...]
>
> Hello Michael,
>
> It seems that it is time to resurrect your DONTCOPY patch. Can you do
> it?
> If you have no time now I can handle it.
>
> ----- Forwarded message from Hugh Dickins <hugh at veritas.com> -----
>
> From: Hugh Dickins <hugh at veritas.com>
> To: Gleb Natapov <gleb at minantech.com>
> Cc: Benjamin Herrenschmidt <benh at kernel.crashing.org>,
> Petr Vandrovec <vandrove at vc.cvut.cz>,
> Nick Piggin <nickpiggin at yahoo.com.au>,
> "Michael S. Tsirkin" <mst at mellanox.co.il>,
> Badari Pulavarty <pbadari at us.ibm.com>,
> Linux Kernel Mailing List <linux-kernel at vger.kernel.org>
> Subject: Re: Nick's core remove PageReserved broke vmware...
> Date: Thu, 3 Nov 2005 14:11:46 +0000 (GMT)
>
> On Thu, 3 Nov 2005, Gleb Natapov wrote:
> > On Wed, Nov 02, 2005 at 10:02:49PM +0000, Hugh Dickins wrote:
> > > On Thu, 3 Nov 2005, Benjamin Herrenschmidt wrote:
> > > > On Wed, 2005-11-02 at 21:41 +0000, Hugh Dickins wrote:
> > > >
> > > > > The only extant problem here is if the pages are private, and
> you
> > > > > fork while this is going on, and the parent user process writes
> to the
> > > > > area before completion: then COW leaves the child with the page
> being
> > > > > DMAed into, giving the parent a copied page which may be
> incomplete.
> > > >
> > > > Won't happen, and if it does, it's a user error to rely on that
> working,
> > > > so it doesn't matter.
> > >
> > > I wish everyone else would see it that way! (But some people do
> > > have valid scenarios where it can't just be ruled out completely.)
> > >
> > I am one of those people :)
> >
> > Last discussion about this issue ended without resolution, but I
> remember
> > you mentioned the possibility to leave ptes writable in parent during
> fork
> > for private pages mapped for DMA. Is this approach acceptable?
>
> I was toying with that idea back then, but it leaves the pages in a
> peculiar limbo between being shared and private, such that it's hard
> to think through the consequences. We do already have a case rather
> like that (ptrace writing to a write-protected area), but some of us
> are a bit worried by that one, so I'd be foolish now to recommend
> another such subversion of the rules.
>
> In the time since we discussed before, I've rather come full circle
> round to my original position: abandoning such ideas of trying to
> handle it from get_user_pages itself, appreciating the simplicity
> of the original PROT_DONTCOPY idea from you guys; but sticking to my
> initial reaction that this is better done by madvise(MADV_DONTCOPY),
> not by the mmap/mprotect route in Michael's patch. (I never bought
> the "racy" argument advanced in favour of the mmap flag.)
>
> One of the factors which has swayed me to the DONTCOPY approach, is
> Nick's 2.6.14 optimization in fork's copy_page_range, where areas
> which can be safely faulted later are not copied pte by pte. But
> that doesn't apply to all areas, and in particular cannot apply to
> VM_NONLINEAR shared areas. It should be of benefit to apps which
> use large such areas, and also do a lot of forking children who don't
> need those areas, to be able to mark them VM_DONTCOPY. Or any other
> vmas the children won't need. (But there's one big distinction between
> the optimization and VM_DONTCOPY: the optimization copies vma but
> doesn't fill in its ptes, VM_DONTCOPY doesn't even copy the vma.)
>
> Two warnings if someone would like to post a MADV_DONTCOPY patch.
> It should include a matching MADV_DOCOPY to clear the condition, but
> that must not be allowed to clear VM_DONTCOPY set originally by driver:
> perhaps you'll end up with a VM_UDONTCOPY or something like that.
>
> And Badari has a MADV_REMOVE patch in the works, taking the next
> slot (just after MADV_DONTNEED in most of the arches): probably
> best for you to base yours on top of his (though yours is simpler
> and might jump ahead).
>
> Hugh
>
> ----- End forwarded message -----
>
> --
> Gleb.
>
--
MST
More information about the general
mailing list