[openib-general] Re: [PATCH][RFC][0/4] InfiniBand userspace verbs implementation

Tue Apr 12 18:04:47 PDT 2005

On Mon, Apr 11, 2005 at 05:13:47PM -0700, Andrew Morton wrote:
> Roland Dreier <roland at topspin.com> wrote:
> >
> >     Troy> Do we even need the mlock in userspace then?
> > 
> > Yes, because the kernel may go through and unmap pages from userspace
> > while trying to swap.  Since we have the page locked in the kernel,
> > the physical page won't go anywhere, but userspace might end up with a
> > different page mapped at the same virtual address.

With the last few kernels I haven't had a chance to retest the problem
that pushed us in the direction of using mlock. I will go back and do
so with the latest kernel. Below I've given a quick description of the
issue.

> That shouldn't happen.  If get_user_pages() has elevated the refcount on a
> page then the following can happen:
> 
> - The VM may decide to add the page to swapcache (if it's not mmapped
>   from a file).
> 
> - Once the page is backed by either swapcache of a (mmapped) file, the VM
>   may decide the unmap the application's pte's.  A later minor fault by the
>   app will cause the same physical page to be remapped.

The driver did use get_user_pages() to elevated the refcount on all the
pages it was going to use for IO, as well as call set_page_dirty() since
the pages were going to have data written to them from the device.

The problem we were seeing is that the minor fault by the app resulted
in a new physical page getting mapped for the application. The page that
had the elevated refcount was still waiting for the data to be written
to by the driver at the time that the app accessed the page causing the
minor fault. Obviously since the app had a new mapping the data written
by the driver was lost.

It looks like code was added to try_to_unmap_one() to address this, so
hopefully it's no longer an issue...

-Libor