[openib-general] Re: madvise MADV_DONTFORK/MADV_DOFORK
Linus Torvalds
torvalds at osdl.org
Mon Feb 13 11:34:43 PST 2006
On Mon, 13 Feb 2006, Roland Dreier wrote:
>
> VM_DONTCOPY is hardly used in the kernel, so the semantics aren't very
> precisely defined.
Now, I agree - it's a strange bit, and was initially just done "because we
can and it seems to be a conceptually valid notion", so it's not used a
lot.
That said, the semantics shouldn't be all that unexpected:
#define VM_DONTCOPY 0x00020000 /* Do not copy this vma on fork */
and the usage ends up matching that (except for some really strange issue
with hugepage counting, which just looks wrong, but never mind).
> But the idea is that a driver setting VM_DONTCOPY
> probably has a good reason for doing it, and we don't want userspace
> to be able to erase that flag through madvise().
Well, I can't actually see any case where a driver could validly do
something that confuses the VM enough that clearing that bit could cause
new problems.
Put another way: if that is true, then we have bigger issues, and should
fix those problems instead.
So at most we might have _applications_ that depend on the fork not
causing a copy-on-write thing (due to the old and broken private mapping
of ioremapped areas behaviour), but if that's true, then it would have to
be the driver itself that does the MADV_DOFORK thing, so..
> As Hugh said in his suggestion for a better changelog entry:
>
> > Explain that MADV_DONTFORK should be reversible, hence
> > MADV_DOFORK; but should not be reversible on areas a driver has
> > so marked, hence VM_DONTFORK distinct from VM_DONTCOPY.
>
> Perhaps we don't care for now, and we should wait and add
> VM_KERNEL_DONTCOPY later if we really need it. I honestly don't know.
I can see where Hugh is coming from, but I think it's adding cruft very
much for a "be very careful" reason.
I would suggest that if you wanted to be very careful, you'd simply
disallow changing - or perhaps just clearing - that DONTCOPY flag on
special regions (ie ones that have been marked with VM_IO or VM_RESERVED).
Linus
More information about the general
mailing list