[openib-general] [PATCH][SDP] AIO buffer corruption

Libor Michalek libor at topspin.com
Fri May 6 12:41:12 PDT 2005


On Fri, May 06, 2005 at 06:28:42PM +0100, Steven Wooding wrote:
> Libor,
> 
> Command: ttcp.aio.c.x -r -fM -a 8 -l 2048
>          ttcp.aio.c.x -t -fM -a 8 -n 10000 -l 2048 {ip of receiver}

  This command reproduces it very quickly, it just happens that the 
parameters I tried did not show the problem.

> Syslog: Get the following message with sucessful transfer, but not with 
> -32/-104 error
>              Kernel: ERR: IOCB <0> cancel <0> flag <00e4> size <{-l}:0:{-l}>
> 
> SDP Debug: Get the following message with all ttcp.aio.c.x runs
>              Kernel: WARN: <9> <0101:3b01> CM state <0> event <9> error <-2>

  Actually the IOCB cancels are not errors, I should change those to
regular messages. Here are the real errors on the send side:

   kernel: WARN: <0> <050e:11b1> Error <-2048> post data during flush
   kernel: WARN: <0> <050e:11b1> Error <-2048> flushing data queue
   kernel: WARN: <0> <050e:11b1> Error <-2048> flushing send queue

> I'm away for two weeks, so I'll get back to you with any further info 
> you require when I get back.

  Here is the patch to fix the problem, the fix exposes another problem,
where the send pipeline stalls and is not restarted, which I'm working on.
The problem was caused by copying more data from the iocb then it contained
sending the size of the iocb to negative.

Signed-off-by: Libor Michalek <libor at topspin.com>

Index: sdp_send.c
===================================================================
--- sdp_send.c	(revision 2270)
+++ sdp_send.c	(working copy)
@@ -742,11 +742,10 @@
 		
 		copy = min((PAGE_SIZE - offset),
 			   (unsigned long)(buff->end - buff->tail));
-
+		copy = min((unsigned long)iocb->len, copy);
 #ifndef _SDP_DATA_PATH_NULL
 		memcpy(buff->tail, (addr + offset), copy);
 #endif
-    
 		buff->data_size += copy;
 		buff->tail      += copy;
 		iocb->post      += copy;
@@ -805,7 +804,7 @@
 		/*
 		 * TODO: need to be checking OOB here.
 		 */
-		result =sdp_send_iocb_buff_write(iocb, buff);
+		result = sdp_send_iocb_buff_write(iocb, buff);
 		if (result < 0) {
 			sdp_dbg_warn(conn, "Error <%d> copy from IOCB <%d>.",
 				     result, iocb->key);
@@ -981,8 +980,9 @@
 		 * error
 		 */
 		if (result < 0) {
-			sdp_dbg_warn(conn, "Error <%d> post data during flush",
-				     result);
+			sdp_dbg_warn(conn, 
+				     "Error <%d> post data <%d> during flush",
+				     result, element->type);
 			/*
 			 * check for dangling element reference,
 			 * since called functions can dequeue the



More information about the general mailing list