[openib-general] Re: [PATCH][RFC][0/4] InfiniBand userspace verbs implementation

Caitlin Bestler caitlin.bestler at gmail.com
Mon Apr 25 07:43:22 PDT 2005


On 4/25/05, Roland Dreier <roland at topspin.com> wrote:
>     Timur> With mlock(), we don't need to use get_user_pages() at all.
>     Timur> Arjan tells me the only time an mlocked page can move is
>     Timur> with hot (un)plug of memory, but that isn't supported on
>     Timur> the systems that we support.  We actually prefer mlock()
>     Timur> over get_user_pages(), because if the process dies, the
>     Timur> locks automatically go away too.
> 
> There actually is another way pages can move, with both
> get_user_pages() and mlock(): copy-on-write after a fork().  If
> userspace does a fork(), then all PTEs are marked read-only, and if
> the original process touches the page after the fork(), a new page
> will be allocated and mapped at the original virtual address.
> 
> This is actually a pretty big pain, because the only good solution
> seems to be for the kernel to mark these registered regions as
> VM_DONTCOPY.  Right now this means that driver code ends up monkeying
> with vm_flags for user vmas.
> 
> Does it seem reasonable to add a new system call to let userspace mark
> memory it doesn't want copied into forked processes?  Something like
> 
>         long sys_mark_nocopy(unsigned long addr, size_t len, int mark)
> 
> which would set VM_DONTCOPY if mark != 0, and clear it if mark == 0.
> A better name would be gratefully accepted...
> 
> Then to register memory for RDMA, userspace would call
> sys_mark_nocopy() (with appropriate accounting to handle possibly
> overlapping regions) and the kernel would call get_user_pages().  The
> get_user_pages() is of course required because the kernel can't trust
> userspace to keep the pages locked.  mlock() would no longer be
> necessary.  We can trust userspace to call sys_mark_nocopy() as
> needed, because a process can only hurt itself and its children by
> misusing the sys_mark_nocopy() call.
> 
> If this seems reasonable then I can code a patch.
> 

Who is responsible for counting within a process, and
then between processes (in case shared memory is
being registered)? The application? Middleware? Driver?

My concern here is that the application layer may not
be fully aware when middleware is registering memory,
and middleware may not be fully aware when the memory
it receives from the application is shared with another
process.



More information about the general mailing list