[openib-general] Re: Re: madvise MADV_DONTFORK/MADV_DOFORK

Gleb Natapov glebn at voltaire.com
Wed Feb 15 01:30:07 PST 2006


On Wed, Feb 15, 2006 at 11:02:50AM +0200, Michael S. Tsirkin wrote:
> Clarification: as I see it, longer term we want to add a flag to make
> get_user_pages trigger an immediate page copy on fork (rather than copy_ptes).
Can you elaborate? Do you mean one more VMA flag (VM_COPYONFORK)?

> In this setup, MADV_DONTFORK will be used to speed up fork for an application
> that has locked a big portion of its address space. With this in mind:
> 
> Quoting r. Gleb Natapov <glebn at voltaire.com>:
> > > > Should call to madvise be the part of reg_mr call?
> > > 
> > > Probably no - MPI should have to do it.
> uDAPL as well, I guess.
> 
> > Then each userspace app will have to reinvent the wheel.
> I thought applications used MPI?
I hope you don't think that infiniband is good only for HPC :) More and more
organisation want to develop applications directly for infiniband without
middle layer. Not all of them want to understand deep VM magic to do so.

> 
> > Remember that we should gracefully handle overlapping registrations.
> Right, and madvise doesnt do any refcouting. That's one reason not to include it
> in reg_mr. 
I beg to differ. I think this is exactly the reason to include it in
reg_mr. Otherwise each application should reinvent refcounting logic. It
is much better to do it right once instead of doing it wrong many times.

> Another is that madvise only works for full pages.
Everything in VM works only for full pages. Unix don't try to hide this
from user.
> 
> Applications should be aware of these limitations, and I think the easiest way
> to achieve this is by asking them to use madvise directly.
The problem not in madvice but in refcounting that each application must maintain.

--
			Gleb.



More information about the general mailing list