[openib-general] Re: [PATCH][RFC][0/4] InfiniBand userspace verbs implementation

Caitlin Bestler caitlin.bestler at gmail.com
Mon Apr 25 14:42:55 PDT 2005


On 4/25/05, Roland Dreier <roland at topspin.com> wrote:
>     Andrew> Do we care about that?  A straightforward scenario under
>     Andrew> which this can happen is:
> 
>     Andrew> a) app starts some read I/O in an asynchronous manner
>     Andrew> b) app forks
>     Andrew> c) child writes to one of the pages which is still under read I/O
>     Andrew> d) the read I/O completes
>     Andrew> e) the child is left with the old data plus the child's modification instead
>     Andrew>    of the new data
> 
>     Andrew> which is a very silly application which is giving itself
>     Andrew> unpredictable memory contents anyway.
> 
>     Andrew> I assume there's a more sensible scenario?
> 
> You're right, that is a silly scenario ;)  In fact if we mark vmas
> with VM_DONTCOPY, then the child just crashes with a seg fault.
> 
> The type of thing I'm worried about is something like, for example:
> 
> a) app registers memory region with RDMA hardware -- in other words,
>    loads the device's translation table for future I/O
> b) app forks
> c) app writes to the registered memory region, and the kernel breaks
>    the COW for the (now read-only) page by mapping a new page
> d) app starts an I/O that will do a DMA read from the region
> e) device reads using the wrong, old mapping
> 
> This can be pretty insiduous because for example fork() + immediate
> exec() or just using system() still leaves the parent with PTEs marked
> read-only.  If an application does overlapping memory registrations so
> get_user_pages() is called a lot, then as far as I can see
> can_share_swap_page() will always return 0 and the COW will happen
> even if the child process has thrown out its original vmas.
> 
> Or if the counts are in the correct range, then there's a small window
> between fork() and exec() where the parent process can screw itself
> up, so most of the time the app works, until it doesn't.
> 

Every RDMA related interface specification that I know of specifically
excludes support of RDMA resources being inherited by child processes,
with the warning that excellent implementations will give the child
process an error for attempting to use the parent's RDMA resources.
More streamlined implementations will simply be unpredictable.

As for forking while the parent has a pending read: since the parent
has not reaped the completion at the time of the fork the buffers
in question are undefined. The child's buffers will be consistent,
that is they are undefined.



More information about the general mailing list