[openib-general] Re: [PATCH] (repost) sdp: replace mlock with get_user_pages

Sun May 15 01:05:03 PDT 2005

Quoting r. Libor Michalek <libor at topspin.com>:
> Subject: Re: [PATCH] (repost) sdp: replace mlock with get_user_pages
> 
> On Tue, May 10, 2005 at 05:32:32PM +0300, Michael S. Tsirkin wrote:
> > Quoting r. Libor Michalek <libor at topspin.com>:
> > > On Thu, May 05, 2005 at 02:01:58PM +0300, Michael S. Tsirkin wrote:
> > > >
> > > > The following patch replaces the mlock hack with call
> > > > to get_user_pages. Since the application could have forked
> > > > while an iocb is outstanding, when an iocb is done
> > > > I do get_user_pages for a second time and copy data if
> > > > the physical address has changed.
> > > > 
> > > > Thus, changing ulimit is no longer required to get aio
> > > > working, processes are also allowed to fork and to call mlock/munlock
> > > > on the buffer.
> > > 
> > >   In the latest kernel what happens to a page that's held
> > > with a get_user_page reference on fork? Which is the only
> > > case left. Since there is an iocv flag parameter it would
> > > be prefered to use a bit there instead of is_receive. Also,
> > > please leave spaces around '='. Finally it would be nice
> > > if we could get rid of having to schedule a task to unlock
> > > the iocb in complete, I've noticed this having a negative
> > > effect on system CPU utilization on older kernels.
> > 
> > I dont see what would prevent the page
> > from being copied on a COW event. Do you? Certainly the patch in 2.6.7
> > that fixed the swap-out problem does not affect COW.
> > With regard to avoiding task scheduling, since current code already
> > needs this, so we are not breaking anything.
> > I put in comments so that we remember to do this research at some point
> > in time.
> > 
> > In the patch below the style issues are addressed: whitespace fixed,
> > is_receive changed to flag.
> 
> Michael,
> 
>   I'm seeing a SDP bug and the only solution I've come up with so far
> conflicts directly with your patch.
> 
>   The problem.
> 
>   When using ttcp.aio.c with buffers below the zcopy threshold, 
> multiple aio buffers outstanding, and verification turned on either 
> send and/or receive side very quickly gets a mismatch error. The 
> mismatch error occurs when the order of buffers returned by 
> io_getevents() differs from the order in which they are passed to
> io_submit(). Here's the params I was using:
> 
>   ./ttcp.aio.x -r -s -l 3276 -f m -a 20 -v
>   ./ttcp.aio.x -t -s -l 3276 -n 50000 -a 20 -f m -v 192.168.0.191
> 
>   The issue.
>    
>   There is a race condition between completing IOs synchronously and
> asynchronously using aio_complete(). The race condition is the result 
> of iocb_complete() spawning a work thread to unlock memory and call
> aio_complete(). It's possible for a new IO to get serviced synchronously
> before the do_iocb_complete() task executes which means that the two IOs
> will appear on the event queue in the reverse order of being submitted.
> 
>   Ths solution.
> 
>   Always call aio_complete() immediately when iocb_complete() is called,
> and only spawn the work thread to unlock the memory after the 
> aio_complete() call. The patch is below.
> 
>   The problem is that your patch relies on access to the page tables
> occurs before the aio_complete() call. Which is the reverse of what
> is in the patch, and is incompatible with calling aio_complete before
> scheduling a task...
> 
> -Libor
> 
> @@ -538,7 +517,28 @@ static void do_iocb_complete(void *arg)
>   */
>  int sdp_iocb_complete(struct sdpc_iocb *iocb, ssize_t status)
>  {
> -	iocb->status = status;
> +	int result;
> +	long value;
> +	/*
> +	 * release memory
> +	 */
> +	result = sdp_iocb_release(iocb);
> +	if (result < 0)
> +		sdp_dbg_err("Error <%d> releasing IOCB <%d> memory <%ld>",
> +			    result, iocb->key, iocb->addr);
> +	/*
> +	 * callback to complete IOCB
> +	 */
> +	value = (iocb->post > 0) ? iocb->post : status;
> +
> +	sdp_dbg_data(NULL, "IOCB complete. <%d:%d:%08lx> value <%08lx:%ld>",
> +		     iocb->req->ki_users, iocb->req->ki_key, 
> +		     iocb->req->ki_flags, iocb->addr, value);
> +	/*
> +	 * valid result can be 0 or 1 for complete so 
> +	 * we ignore the value.
> +	 */
> +	(void)aio_complete(iocb->req, value, 0);
>  	
>  	if (in_atomic() || irqs_disabled()) {
>  		INIT_WORK(&iocb->completion, do_iocb_complete, (void *)iocb);

By the way, as was already discussed here, in_atomic() may not
catch all cases where its illegal to sleep.
If the iocb functions get passed the socket parameter,
the iocb can use the conn->lock.users flag to figure out that its
OK to sleep.

-- 
MST - Michael S. Tsirkin