[openib-general] Re: [PATCH][RFC][0/4] InfiniBand userspace verbs implementation

Roland Dreier roland at topspin.com
Mon Apr 25 14:12:40 PDT 2005


    Andrew> Do we care about that?  A straightforward scenario under
    Andrew> which this can happen is:

    Andrew> a) app starts some read I/O in an asynchronous manner
    Andrew> b) app forks
    Andrew> c) child writes to one of the pages which is still under read I/O
    Andrew> d) the read I/O completes
    Andrew> e) the child is left with the old data plus the child's modification instead
    Andrew>    of the new data

    Andrew> which is a very silly application which is giving itself
    Andrew> unpredictable memory contents anyway.

    Andrew> I assume there's a more sensible scenario?

You're right, that is a silly scenario ;)  In fact if we mark vmas
with VM_DONTCOPY, then the child just crashes with a seg fault.

The type of thing I'm worried about is something like, for example:

a) app registers memory region with RDMA hardware -- in other words,
   loads the device's translation table for future I/O
b) app forks
c) app writes to the registered memory region, and the kernel breaks
   the COW for the (now read-only) page by mapping a new page
d) app starts an I/O that will do a DMA read from the region
e) device reads using the wrong, old mapping

This can be pretty insiduous because for example fork() + immediate
exec() or just using system() still leaves the parent with PTEs marked
read-only.  If an application does overlapping memory registrations so
get_user_pages() is called a lot, then as far as I can see
can_share_swap_page() will always return 0 and the COW will happen
even if the child process has thrown out its original vmas.

Or if the counts are in the correct range, then there's a small window
between fork() and exec() where the parent process can screw itself
up, so most of the time the app works, until it doesn't.

 - R.



More information about the general mailing list