[openib-general] Re: [PATCH] (repost) sdp: replace mlock with get_user_pages
Michael S. Tsirkin
mst at mellanox.co.il
Sun May 15 01:05:03 PDT 2005
Quoting r. Libor Michalek <libor at topspin.com>:
> Subject: Re: [PATCH] (repost) sdp: replace mlock with get_user_pages
>
> On Tue, May 10, 2005 at 05:32:32PM +0300, Michael S. Tsirkin wrote:
> > Quoting r. Libor Michalek <libor at topspin.com>:
> > > On Thu, May 05, 2005 at 02:01:58PM +0300, Michael S. Tsirkin wrote:
> > > >
> > > > The following patch replaces the mlock hack with call
> > > > to get_user_pages. Since the application could have forked
> > > > while an iocb is outstanding, when an iocb is done
> > > > I do get_user_pages for a second time and copy data if
> > > > the physical address has changed.
> > > >
> > > > Thus, changing ulimit is no longer required to get aio
> > > > working, processes are also allowed to fork and to call mlock/munlock
> > > > on the buffer.
> > >
> > > In the latest kernel what happens to a page that's held
> > > with a get_user_page reference on fork? Which is the only
> > > case left. Since there is an iocv flag parameter it would
> > > be prefered to use a bit there instead of is_receive. Also,
> > > please leave spaces around '='. Finally it would be nice
> > > if we could get rid of having to schedule a task to unlock
> > > the iocb in complete, I've noticed this having a negative
> > > effect on system CPU utilization on older kernels.
> >
> > I dont see what would prevent the page
> > from being copied on a COW event. Do you? Certainly the patch in 2.6.7
> > that fixed the swap-out problem does not affect COW.
> > With regard to avoiding task scheduling, since current code already
> > needs this, so we are not breaking anything.
> > I put in comments so that we remember to do this research at some point
> > in time.
> >
> > In the patch below the style issues are addressed: whitespace fixed,
> > is_receive changed to flag.
>
> Michael,
>
> I'm seeing a SDP bug and the only solution I've come up with so far
> conflicts directly with your patch.
>
> The problem.
>
> When using ttcp.aio.c with buffers below the zcopy threshold,
> multiple aio buffers outstanding, and verification turned on either
> send and/or receive side very quickly gets a mismatch error. The
> mismatch error occurs when the order of buffers returned by
> io_getevents() differs from the order in which they are passed to
> io_submit(). Here's the params I was using:
>
> ./ttcp.aio.x -r -s -l 3276 -f m -a 20 -v
> ./ttcp.aio.x -t -s -l 3276 -n 50000 -a 20 -f m -v 192.168.0.191
>
> The issue.
>
> There is a race condition between completing IOs synchronously and
> asynchronously using aio_complete(). The race condition is the result
> of iocb_complete() spawning a work thread to unlock memory and call
> aio_complete(). It's possible for a new IO to get serviced synchronously
> before the do_iocb_complete() task executes which means that the two IOs
> will appear on the event queue in the reverse order of being submitted.
>
> Ths solution.
>
> Always call aio_complete() immediately when iocb_complete() is called,
> and only spawn the work thread to unlock the memory after the
> aio_complete() call. The patch is below.
>
> The problem is that your patch relies on access to the page tables
> occurs before the aio_complete() call. Which is the reverse of what
> is in the patch, and is incompatible with calling aio_complete before
> scheduling a task...
>
> -Libor
>
> @@ -538,7 +517,28 @@ static void do_iocb_complete(void *arg)
> */
> int sdp_iocb_complete(struct sdpc_iocb *iocb, ssize_t status)
> {
> - iocb->status = status;
> + int result;
> + long value;
> + /*
> + * release memory
> + */
> + result = sdp_iocb_release(iocb);
> + if (result < 0)
> + sdp_dbg_err("Error <%d> releasing IOCB <%d> memory <%ld>",
> + result, iocb->key, iocb->addr);
> + /*
> + * callback to complete IOCB
> + */
> + value = (iocb->post > 0) ? iocb->post : status;
> +
> + sdp_dbg_data(NULL, "IOCB complete. <%d:%d:%08lx> value <%08lx:%ld>",
> + iocb->req->ki_users, iocb->req->ki_key,
> + iocb->req->ki_flags, iocb->addr, value);
> + /*
> + * valid result can be 0 or 1 for complete so
> + * we ignore the value.
> + */
> + (void)aio_complete(iocb->req, value, 0);
>
> if (in_atomic() || irqs_disabled()) {
> INIT_WORK(&iocb->completion, do_iocb_complete, (void *)iocb);
By the way, as was already discussed here, in_atomic() may not
catch all cases where its illegal to sleep.
If the iocb functions get passed the socket parameter,
the iocb can use the conn->lock.users flag to figure out that its
OK to sleep.
--
MST - Michael S. Tsirkin
More information about the general
mailing list