[openib-general] Re: Re: madvise MADV_DONTFORK/MADV_DOFORK

Michael S. Tsirkin mst at mellanox.co.il
Sun Mar 12 04:56:49 PST 2006


Quoting r. Gleb Natapov <glebn at voltaire.com>:
> Subject: Re: [openib-general] Re: Re: madvise MADV_DONTFORK/MADV_DOFORK
> 
> On Wed, Feb 15, 2006 at 12:14:48PM +0200, Michael S. Tsirkin wrote:
> > > > Clarification: as I see it, longer term we want to add a flag to make
> > > > get_user_pages trigger an immediate page copy on fork (rather than
> > > > copy_ptes).
> > >
> > > Can you elaborate? Do you mean one more VMA flag (VM_COPYONFORK)?
> > 
> > This should hopefully solve more than just the reg_mr issue, and not
> > specific to infiniband. See e.g. here: http://lkml.org/lkml/2005/12/12/30
> > So no, this will have to be a per-page flag: set by get_user_pages when
> > passed some new option, and cleared by put_page when the page ref count
> > drops to page map count.
>
> Yes this is very serious issue I wonder why aio users don't complain all
> over the lklm. (or should aio buffers have to be aligned?)
> 
> > BTW, I dont know when I will get around to working on it, so any help
> > would be appreciated.
>
> Do you think new page flag is a viable solution? With the holy war
> against new (and old) page flags. Besides fork will have to go from pte to 
> struct page to check flags for each mapped page in the process!

I thought about this some more, and I think you are right.
Adding overhead on fork and page flags won't fly.

My current thinking then goes vaguely along the lines of:

We need a way to distinguish the parent from child on COW. A flag in the VMA
will do it.

Then, when the parent (flag set) writes to the page and COW is activated, and
the page is mapped into more than one process, and there is a driver keeping a
reference on a page, we need to find everyone who maps the page and fix them
to refer to the page copy rather than the original. The parent will still have
the original page.

How does this sound?

-- 
Michael S. Tsirkin
Staff Engineer, Mellanox Technologies



More information about the general mailing list