[openib-general] Re: [PATCH][RFC][0/4] InfiniBand userspace verbs implementation

Andrew Morton akpm at osdl.org
Mon Apr 25 15:14:59 PDT 2005


Roland Dreier <roland at topspin.com> wrote:
>
>     Andrew> Do we care about that?  A straightforward scenario under
>     Andrew> which this can happen is:
> 
>     Andrew> a) app starts some read I/O in an asynchronous manner
>     Andrew> b) app forks
>     Andrew> c) child writes to one of the pages which is still under read I/O
>     Andrew> d) the read I/O completes
>     Andrew> e) the child is left with the old data plus the child's modification instead
>     Andrew>    of the new data
> 
>     Andrew> which is a very silly application which is giving itself
>     Andrew> unpredictable memory contents anyway.
> 
>     Andrew> I assume there's a more sensible scenario?
> 
> You're right, that is a silly scenario ;)  In fact if we mark vmas
> with VM_DONTCOPY, then the child just crashes with a seg fault.
> 
> The type of thing I'm worried about is something like, for example:
> 
> a) app registers memory region with RDMA hardware -- in other words,
>    loads the device's translation table for future I/O

Whoa, hang on.

The way we expect get_user_pages() to be used is that the kernel will use
get_user_pages() once per application I/O request.

Are you saying that RDMA clients will semi-permanently own pages which were
pinned by get_user_pages()?  That those pages will be used for multiple
separate I/O operations?

If so, then that's a significant design departure and it would be good to
hear why it is necessary.

> b) app forks
> c) app writes to the registered memory region, and the kernel breaks
>    the COW for the (now read-only) page by mapping a new page
> d) app starts an I/O that will do a DMA read from the region
> e) device reads using the wrong, old mapping

Sure.  But such an app could be declared to be buggy...

> This can be pretty insiduous because for example fork() + immediate
> exec() or just using system() still leaves the parent with PTEs marked
> read-only.  If an application does overlapping memory registrations so
> get_user_pages() is called a lot, then as far as I can see
> can_share_swap_page() will always return 0 and the COW will happen
> even if the child process has thrown out its original vmas.
> 
> Or if the counts are in the correct range, then there's a small window
> between fork() and exec() where the parent process can screw itself
> up, so most of the time the app works, until it doesn't.
> 
>  - R.



More information about the general mailing list