[ofw][patch] [mlx4] Fixing a race during LSO header building (WQE bug)

Alex Naslednikov xalex at mellanox.co.il
Thu Dec 11 08:02:39 PST 2008


Theoretically, the following bug can occur: LSO header may take less
than 64 Bytes (cacheline size), and function build_lso_seg (qp.c) will
notify that the header is ready. Than prefetch algoritm may take this
cache block BEFORE the rest of the WQE will be completed (S/G element
list)

Explanations:
WQE structure:
 
1-st 64 bytes (always aligned to cache line of 64)
00 CTRL start  (first DWORD)
..
03 CTRL end
04 Dgram start
..
15 Dgram end
 ----
2-nd 64 bytes (also always aligned to cache line)
LSO Header:
 
16 MSS | LEN
17 IPOIB HEADER
18 IPOIB HEADER
19 TCP+IP HEADERS
....
 
When TCP+IP headers  exceed 13 DWORDS, we always set OPCODE_INVALID_BIT
to be true, thus preventing WQE bug to occur.
But consider the following situation, when TCP and IP Header each takes
5 DWORDS length (minimal possible length)
Thus, 2nd 64 bytes of WQE will be as following:
 
16 MSS | LEN
17 IPoIB header
18 IPoIB header
19 IP header start
...
23 IP header end
24 TCP Header start
...
28 TCPHeader end
29 -------
30 S/G element list will start there
31 -------
 
The function that build LSO segment (build_lso_seg) will overright the
stamping right after memcpy() of the headers, thus PREFETCH mechanism
may theoretically take bytes 16:31 BEFORE bytes 29-31 were actually
filled with valid data.

Signed-off by: Alexander Naslednikov (xalex at mellanox.co.il)
 Index: D:/Windows/MLNX_WINOF_2_0_0/hw/mlx4/kernel/bus/ib/qp.c
===================================================================
--- D:/Windows/MLNX_WINOF_2_0_0/hw/mlx4/kernel/bus/ib/qp.c
(revision 3608)
+++ D:/Windows/MLNX_WINOF_2_0_0/hw/mlx4/kernel/bus/ib/qp.c
(revision 3609)
@@ -1507,8 +1507,17 @@
 				}
 #define I64_CACHE_LINE 		64
 #define OPCODE_INVALID_BIT 	6
-				// WQE bug treatment for LSO case, when
LSO header is large enough
-				if (unlikely (seglen > I64_CACHE_LINE))
{
+				// WQE bug treatment for LSO case
+				// If LSO segment is large enough
(exceeds one cache block in size)
+				// or if it small enough such that S/G
element will be placed within the same cache block,
+				// OPCODE_INVALID_BIT should be on in
order to reread this WQE 
+				// More correct solution is 
+				//	if (unlikely (seglen %
I64_CACHE_LINE < 4) )) 
+				// but it will not be used in order to
reduce calculations within Datapath
+				// If LSO segment consists of 15 DWORDS,
S/G elements block will nevertheless start from 
+				// the next cache block
+				if (unlikely (seglen < I64_CACHE_LINE-4
|| seglen > I64_CACHE_LINE )) {
+				
 					ctrl->owner_opcode |=
cpu_to_be32 ( 1 << OPCODE_INVALID_BIT);
 				}
 				wqe  += seglen;
-------------- next part --------------
A non-text attachment was scrubbed...
Name: wqe_bug.diff
Type: application/octet-stream
Size: 1275 bytes
Desc: wqe_bug.diff
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20081211/ef08c509/attachment.obj>


More information about the ofw mailing list