<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML xmlns="http://www.w3.org/TR/REC-html40" xmlns:ex12m =
"http://schemas.microsoft.com/exchange/services/2006/messages" xmlns:ex12t =
"http://schemas.microsoft.com/exchange/services/2006/types" xmlns:mrels =
"http://schemas.openxmlformats.org/package/2006/relationships" xmlns:m =
"http://schemas.microsoft.com/office/2004/12/omml" xmlns:mver =
"http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:wf =
"http://schemas.microsoft.com/sharepoint/soap/workflow/" xmlns:udcxf =
"http://schemas.microsoft.com/data/udc/xmlfile" xmlns:xsi =
"http://www.w3.org/2001/XMLSchema-instance" xmlns:sps =
"http://schemas.microsoft.com/sharepoint/soap/" xmlns:xsd =
"http://www.w3.org/2001/XMLSchema" xmlns:udc =
"http://schemas.microsoft.com/data/udc" xmlns:dsp =
"http://schemas.microsoft.com/sharepoint/dsp" xmlns:ds =
"http://www.w3.org/2000/09/xmldsig#" xmlns:dir =
"http://schemas.microsoft.com/sharepoint/soap/directory/" xmlns:ois =
"http://schemas.microsoft.com/sharepoint/soap/ois/" xmlns:x2 =
"http://schemas.microsoft.com/office/excel/2003/xml" XMLNS:D = "DAV:" xmlns:q =
"http://schemas.xmlsoap.org/soap/envelope/" xmlns:html =
"http://www.w3.org/TR/REC-html40" xmlns:oa =
"urn:schemas-microsoft-com:office:activation" xmlns:c =
"urn:schemas-microsoft-com:office:component:spreadsheet" xmlns:ss =
"urn:schemas-microsoft-com:office:spreadsheet" xmlns:b =
"urn:schemas-microsoft-com:office:publisher" xmlns:z = "#RowsetSchema" xmlns:rs
= "urn:schemas-microsoft-com:rowset" xmlns:s =
"uuid:BDC6E3F0-6DA3-11d1-A2A3-00AA00C14882" xmlns:dt =
"uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" xmlns:a =
"urn:schemas-microsoft-com:office:access" xmlns:p =
"urn:schemas-microsoft-com:office:powerpoint" xmlns:x =
"urn:schemas-microsoft-com:office:excel" xmlns:w =
"urn:schemas-microsoft-com:office:word" xmlns:o =
"urn:schemas-microsoft-com:office:office" xmlns:v =
"urn:schemas-microsoft-com:vml"><HEAD>
<META http-equiv=Content-Type content="text/html; charset=us-ascii">
<STYLE>@font-face {
font-family: Cambria Math;
}
@font-face {
font-family: Calibri;
}
@page Section1 {size: 8.5in 11.0in; margin: 1.0in 1.0in 1.0in 1.0in; }
P.MsoNormal {
FONT-SIZE: 11pt; MARGIN: 0in 0in 0pt; FONT-FAMILY: "Calibri","sans-serif"
}
LI.MsoNormal {
FONT-SIZE: 11pt; MARGIN: 0in 0in 0pt; FONT-FAMILY: "Calibri","sans-serif"
}
DIV.MsoNormal {
FONT-SIZE: 11pt; MARGIN: 0in 0in 0pt; FONT-FAMILY: "Calibri","sans-serif"
}
A:link {
COLOR: blue; TEXT-DECORATION: underline; mso-style-priority: 99
}
SPAN.MsoHyperlink {
COLOR: blue; TEXT-DECORATION: underline; mso-style-priority: 99
}
A:visited {
COLOR: purple; TEXT-DECORATION: underline; mso-style-priority: 99
}
SPAN.MsoHyperlinkFollowed {
COLOR: purple; TEXT-DECORATION: underline; mso-style-priority: 99
}
SPAN.EmailStyle17 {
COLOR: windowtext; FONT-FAMILY: "Calibri","sans-serif"; mso-style-type: personal-compose
}
.MsoChpDefault {
mso-style-type: export-only
}
DIV.Section1 {
page: Section1
}
</STYLE>
<!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
<META content="MSHTML 6.00.6000.16441" name=GENERATOR></HEAD>
<BODY lang=EN-US vLink=purple link=blue>
<DIV><SPAN class=062065906-21122007><FONT face=Arial color=#0000ff size=2>The
lock around the ib_poll_cq and complete_wq solution was checked in at
#924.</FONT></SPAN></DIV>
<DIV><SPAN class=062065906-21122007><FONT face=Arial color=#0000ff
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=062065906-21122007><FONT face=Arial color=#0000ff size=2>Thanks
Fab</FONT></SPAN></DIV>
<DIV><SPAN class=062065906-21122007><FONT face=Arial color=#0000ff
size=2>Tzachi</FONT></SPAN></DIV><BR>
<BLOCKQUOTE dir=ltr
style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #0000ff 2px solid; MARGIN-RIGHT: 0px">
<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left>
<HR tabIndex=-1>
<FONT face=Tahoma size=2><B>From:</B> Tzachi Dar <BR><B>Sent:</B> Wednesday,
December 19, 2007 2:07 PM<BR><B>To:</B> 'Fab Tillier'<BR><B>Cc:</B>
ofw@lists.openfabrics.org<BR><B>Subject:</B> RE: [WSD] Duplicate send
completion bug<BR></FONT><BR></DIV>
<DIV></DIV>
<DIV><SPAN class=390174808-19122007><FONT face=Arial color=#0000ff
size=2>Thanks for the info Fab,</FONT></SPAN></DIV>
<DIV><SPAN class=390174808-19122007><FONT face=Arial color=#0000ff
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=390174808-19122007><FONT face=Arial color=#0000ff
size=2>There seems indeed to be a bug as you describe it.</FONT></SPAN></DIV>
<DIV><SPAN class=390174808-19122007><FONT face=Arial color=#0000ff
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=390174808-19122007><FONT face=Arial color=#0000ff
size=2>There are a few ways of how it can be solved and I would like to know
your opinion before I start.</FONT></SPAN></DIV>
<DIV><SPAN class=390174808-19122007><FONT face=Arial color=#0000ff size=2>So,
first thing is this, in your description you talk about a problem in the send
code. As far as I can tell, the same problem exactly also happens in
the receive code. So I guess that a solution will have to solve both
problems.</FONT></SPAN></DIV>
<DIV><SPAN class=390174808-19122007><FONT face=Arial color=#0000ff
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=390174808-19122007><FONT face=Arial color=#0000ff size=2>I'm
looking for a solution that will not introduce new locks if
possibale.</FONT></SPAN></DIV>
<DIV><SPAN class=390174808-19122007><FONT face=Arial color=#0000ff
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=390174808-19122007><FONT face=Arial color=#0000ff size=2>So,
assuming that the problem is in the send only, I guess that a simple solution
would simply be to abandon the socket_info->send_wr at all. Following this
approach, we use the send_wr.wr_id to hold the overlapped structure itself and
we use the offset and offsethigh in order to store the socket_info. This seems
straight forward, very simple, no locks. Still this doesn't solve the receive
problem.</FONT></SPAN></DIV>
<DIV><SPAN class=390174808-19122007><FONT face=Arial color=#0000ff
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=390174808-19122007><FONT face=Arial color=#0000ff size=2>So,
assuming the same problem is also in the receiver, I want to understand which
locks I should use and where.</FONT></SPAN></DIV>
<DIV><SPAN class=390174808-19122007><FONT face=Arial color=#0000ff size=2>As
far as I can see, there is no single lock that I can take to solve the
problem. First, I'll have to take a lock for the sender and another lock
for the receiver. Second and probably worse, locking the complete_wq function
itself probably won't work, as the same problem can happen the minute I live
this function. As so, one will probably have to take the lock before the call
to complete_wq and release it only after the call to
WPUCompleteOverlappedRequest which is a very wide lock (or actually
locks).</FONT></SPAN></DIV>
<DIV><SPAN class=390174808-19122007><FONT face=Arial color=#0000ff
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=390174808-19122007><FONT face=Arial color=#0000ff size=2>So,
it seems that if I understand correctly, the right solution is to make another
mechanism for allocating the wr and freeing them, which probably means one
more lock/unlock in order to do the allocation.</FONT></SPAN></DIV>
<DIV><SPAN class=390174808-19122007><FONT face=Arial color=#0000ff
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=390174808-19122007><FONT face=Arial color=#0000ff size=2>Any
feedback is welcomed.</FONT></SPAN></DIV>
<DIV><SPAN class=390174808-19122007><FONT face=Arial color=#0000ff
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=390174808-19122007><FONT face=Arial color=#0000ff
size=2>Thanks</FONT></SPAN></DIV>
<DIV><SPAN class=390174808-19122007><FONT face=Arial color=#0000ff
size=2>Tzachi</FONT></SPAN></DIV>
<DIV><SPAN class=390174808-19122007><FONT face=Arial color=#0000ff
size=2></FONT></SPAN> </DIV><BR>
<BLOCKQUOTE dir=ltr
style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #0000ff 2px solid; MARGIN-RIGHT: 0px">
<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left>
<HR tabIndex=-1>
<FONT face=Tahoma size=2><B>From:</B> Fab Tillier
[mailto:ftillier@windows.microsoft.com] <BR><B>Sent:</B> Wednesday, December
19, 2007 3:06 AM<BR><B>To:</B> Tzachi Dar<BR><B>Cc:</B>
ofw@lists.openfabrics.org<BR><B>Subject:</B> [WSD] Duplicate send completion
bug<BR></FONT><BR></DIV>
<DIV></DIV>
<DIV class=Section1>
<P class=MsoNormal>Hi Tzachi,<o:p></o:p></P>
<P class=MsoNormal><o:p> </o:p></P>
<P class=MsoNormal>If you are no longer the WSD maintainer, please forward
to the appropriate person.<o:p></o:p></P>
<P class=MsoNormal><o:p> </o:p></P>
<P class=MsoNormal>There is a race condition in the WSD provider that
results in memory corruption due to a send OVERLAPPED being reported twice,
and one being dropped.<o:p></o:p></P>
<P class=MsoNormal><o:p> </o:p></P>
<P class=MsoNormal>Take two threads, one (Thread A) the application thread,
moving data, the other (Thread B) is the CQ completion thread. There
are 3 sends posted, so send_cnt == 3, send_idx = 3.<o:p></o:p></P>
<P class=MsoNormal><BR>Thread B is in complete_wq, having polled 1 send
completions and processing the it when it gets pre-empted by Thread A.
Thread A call GetOverlappedResult, polls the CQ and picks up the 2 other
send completions, processes them, and returns, and more send requests are
issued to the provider. It is possible for Thread A to remain busy
enough processing send and receive completions from the provider that Thread
B doesn’t get to complete running. The send completion that Thread A
is going to process references the send WR (struct _wr) at index 0.
Thread B completes WR 1 and 2, issues 12 more requests uneventfully (up to
WR 15), all the mean time processing completions so that send_cnt is less
than the limit. The next send is the eventful one, because it uses WR
at index 0 again – the WR that Thread B is currently processing. It
overwrites the OVERLAPPED pointer in the WR structure at index 0. When
this send completes, it will report the new OVERLAPPED value. If
Thread B gets to run before Thread A completes this send, it will be marked
completed though it is still being transferred by the HW. When the
send completes, the overlapped will be marked complete again, potentially
completing another send prematurely. The original OVERLAPPED is lost
in this case, and never marked as complete.<o:p></o:p></P>
<P class=MsoNormal><o:p> </o:p></P>
<P class=MsoNormal>There are several ways of fixing this, ranging from
locking around complete_wq to redesigning the WR usage to not use a circular
array so that entries aren’t reused until they are complete.<o:p></o:p></P>
<P class=MsoNormal><o:p> </o:p></P>
<P
class=MsoNormal>-Fab<o:p></o:p></P></DIV></BLOCKQUOTE></BLOCKQUOTE></BODY></HTML>