[openib-general] Re: madvise MADV_DONTFORK/MADV_DOFORK

Linus Torvalds torvalds at osdl.org
Mon Feb 13 11:34:43 PST 2006



On Mon, 13 Feb 2006, Roland Dreier wrote:
> 
> VM_DONTCOPY is hardly used in the kernel, so the semantics aren't very
> precisely defined.

Now, I agree - it's a strange bit, and was initially just done "because we 
can and it seems to be a conceptually valid notion", so it's not used a 
lot.

That said, the semantics shouldn't be all that unexpected:

	#define VM_DONTCOPY  0x00020000		/* Do not copy this vma on fork */

and the usage ends up matching that (except for some really strange issue 
with hugepage counting, which just looks wrong, but never mind).

>		But the idea is that a driver setting VM_DONTCOPY
> probably has a good reason for doing it, and we don't want userspace
> to be able to erase that flag through madvise().

Well, I can't actually see any case where a driver could validly do 
something that confuses the VM enough that clearing that bit could cause 
new problems. 

Put another way: if that is true, then we have bigger issues, and should 
fix those problems instead.

So at most we might have _applications_ that depend on the fork not 
causing a copy-on-write thing (due to the old and broken private mapping 
of ioremapped areas behaviour), but if that's true, then it would have to 
be the driver itself that does the MADV_DOFORK thing, so..

> As Hugh said in his suggestion for a better changelog entry:
> 
>     > Explain that MADV_DONTFORK should be reversible, hence
>     > MADV_DOFORK; but should not be reversible on areas a driver has
>     > so marked, hence VM_DONTFORK distinct from VM_DONTCOPY.
> 
> Perhaps we don't care for now, and we should wait and add
> VM_KERNEL_DONTCOPY later if we really need it.  I honestly don't know.

I can see where Hugh is coming from, but I think it's adding cruft very 
much for a "be very careful" reason.

I would suggest that if you wanted to be very careful, you'd simply 
disallow changing - or perhaps just clearing - that DONTCOPY flag on 
special regions (ie ones that have been marked with VM_IO or VM_RESERVED).

			Linus



More information about the general mailing list