[openib-general] Re: [PATCH] (repost) sdp: replace mlock with get_user_pages

Libor Michalek libor at topspin.com
Thu May 12 15:10:51 PDT 2005


On Tue, May 10, 2005 at 05:32:32PM +0300, Michael S. Tsirkin wrote:
> Quoting r. Libor Michalek <libor at topspin.com>:
> > On Thu, May 05, 2005 at 02:01:58PM +0300, Michael S. Tsirkin wrote:
> > >
> > > The following patch replaces the mlock hack with call
> > > to get_user_pages. Since the application could have forked
> > > while an iocb is outstanding, when an iocb is done
> > > I do get_user_pages for a second time and copy data if
> > > the physical address has changed.
> > > 
> > > Thus, changing ulimit is no longer required to get aio
> > > working, processes are also allowed to fork and to call mlock/munlock
> > > on the buffer.
> > 
> >   In the latest kernel what happens to a page that's held
> > with a get_user_page reference on fork? Which is the only
> > case left. Since there is an iocv flag parameter it would
> > be prefered to use a bit there instead of is_receive. Also,
> > please leave spaces around '='. Finally it would be nice
> > if we could get rid of having to schedule a task to unlock
> > the iocb in complete, I've noticed this having a negative
> > effect on system CPU utilization on older kernels.
> 
> I dont see what would prevent the page
> from being copied on a COW event. Do you? Certainly the patch in 2.6.7
> that fixed the swap-out problem does not affect COW.
> With regard to avoiding task scheduling, since current code already
> needs this, so we are not breaking anything.
> I put in comments so that we remember to do this research at some point
> in time.
> 
> In the patch below the style issues are addressed: whitespace fixed,
> is_receive changed to flag.

Michael,

  I'm seeing a SDP bug and the only solution I've come up with so far
conflicts directly with your patch.

  The problem.

  When using ttcp.aio.c with buffers below the zcopy threshold, 
multiple aio buffers outstanding, and verification turned on either 
send and/or receive side very quickly gets a mismatch error. The 
mismatch error occurs when the order of buffers returned by 
io_getevents() differs from the order in which they are passed to
io_submit(). Here's the params I was using:

  ./ttcp.aio.x -r -s -l 3276 -f m -a 20 -v
  ./ttcp.aio.x -t -s -l 3276 -n 50000 -a 20 -f m -v 192.168.0.191

  The issue.
   
  There is a race condition between completing IOs synchronously and
asynchronously using aio_complete(). The race condition is the result 
of iocb_complete() spawning a work thread to unlock memory and call
aio_complete(). It's possible for a new IO to get serviced synchronously
before the do_iocb_complete() task executes which means that the two IOs
will appear on the event queue in the reverse order of being submitted.

  Ths solution.

  Always call aio_complete() immediately when iocb_complete() is called,
and only spawn the work thread to unlock the memory after the 
aio_complete() call. The patch is below.

  The problem is that your patch relies on access to the page tables
occurs before the aio_complete() call. Which is the reverse of what
is in the patch, and is incompatible with calling aio_complete before
scheduling a task...

-Libor

Index: sdp_send.c
===================================================================
--- sdp_send.c	(revision 2326)
+++ sdp_send.c	(working copy)
@@ -2059,7 +2059,8 @@ int sdp_inet_send(struct kiocb *req, str
 	 * they are smaller then the zopy threshold, but only if there is
 	 * no buffer write space.
 	 */
-	if (!(conn->src_zthresh > size) && !is_sync_kiocb(req))
+	if (!is_sync_kiocb(req) && (sdp_desc_q_size(&conn->send_queue) ||
+				    !(conn->src_zthresh > size)))
 		goto skip;
 	/*
 	 * clear ASYN space bit, it'll be reset if there is no space.
Index: sdp_iocb.c
===================================================================
--- sdp_iocb.c	(revision 2328)
+++ sdp_iocb.c	(working copy)
@@ -496,14 +496,6 @@ static void do_iocb_complete(void *arg)
 {
 	struct sdpc_iocb *iocb = (struct sdpc_iocb *)arg;
 	int result;
-	long value;
-	/*
-	 * release memory
-	 */
-	result = sdp_iocb_release(iocb);
-	if (result < 0)
-		sdp_dbg_err("Error <%d> releasing IOCB <%d> memory <%ld>",
-			    result, iocb->key, iocb->addr);
 	/*
 	 * unlock now, after aio_complete the mm reference will be released.
 	 */
@@ -512,25 +504,12 @@ static void do_iocb_complete(void *arg)
 		sdp_dbg_err("Error <%d> unlocking IOCB <%d memory <%ld>>",
 			    result, iocb->key, iocb->addr);
 	/*
-	 * callback to complete IOCB
-	 */
-	value = (iocb->post > 0) ? iocb->post : iocb->status;
-
-	sdp_dbg_data(NULL, "IOCB complete. <%d:%d:%08lx> value <%ld>",
-		     iocb->req->ki_users, iocb->req->ki_key, 
-		     iocb->req->ki_flags, value);
-	/*
-	 * valid result can be 0 or 1 for complete so 
-	 * we ignore the value.
-	 */
-	(void)aio_complete(iocb->req, value, 0);
-	/*
 	 * delete IOCB
 	 */
 	result = sdp_iocb_destroy(iocb);
 	if (result < 0)
-		sdp_dbg_err("Error <%d> deleting IOCB <%d> of status <%Zu>",
-			    result, iocb->key, iocb->status);
+		sdp_dbg_err("Error <%d> deleting IOCB <%d:%08lx>",
+			    result, iocb->key, iocb->addr);
 }
 
 /*
@@ -538,7 +517,28 @@ static void do_iocb_complete(void *arg)
  */
 int sdp_iocb_complete(struct sdpc_iocb *iocb, ssize_t status)
 {
-	iocb->status = status;
+	int result;
+	long value;
+	/*
+	 * release memory
+	 */
+	result = sdp_iocb_release(iocb);
+	if (result < 0)
+		sdp_dbg_err("Error <%d> releasing IOCB <%d> memory <%ld>",
+			    result, iocb->key, iocb->addr);
+	/*
+	 * callback to complete IOCB
+	 */
+	value = (iocb->post > 0) ? iocb->post : status;
+
+	sdp_dbg_data(NULL, "IOCB complete. <%d:%d:%08lx> value <%08lx:%ld>",
+		     iocb->req->ki_users, iocb->req->ki_key, 
+		     iocb->req->ki_flags, iocb->addr, value);
+	/*
+	 * valid result can be 0 or 1 for complete so 
+	 * we ignore the value.
+	 */
+	(void)aio_complete(iocb->req, value, 0);
 	
 	if (in_atomic() || irqs_disabled()) {
 		INIT_WORK(&iocb->completion, do_iocb_complete, (void *)iocb);
Index: sdp_iocb.h
===================================================================
--- sdp_iocb.h	(revision 2326)
+++ sdp_iocb.h	(working copy)
@@ -88,7 +88,6 @@ struct sdpc_iocb {
 	int len;    /* space left in the user buffer */
 	int post;   /* amount of data requested so far. */
 	u64 wrid;   /* work request completing this IOCB */
-	ssize_t status; /* status of completed iocb */
 	/*
 	 * IB specific information for zcopy.
 	 */






More information about the general mailing list