[ofw] [WSD] Duplicate send completion bug
Fab Tillier
ftillier at windows.microsoft.com
Tue Dec 18 17:06:24 PST 2007
Hi Tzachi,
If you are no longer the WSD maintainer, please forward to the appropriate person.
There is a race condition in the WSD provider that results in memory corruption due to a send OVERLAPPED being reported twice, and one being dropped.
Take two threads, one (Thread A) the application thread, moving data, the other (Thread B) is the CQ completion thread. There are 3 sends posted, so send_cnt == 3, send_idx = 3.
Thread B is in complete_wq, having polled 1 send completions and processing the it when it gets pre-empted by Thread A. Thread A call GetOverlappedResult, polls the CQ and picks up the 2 other send completions, processes them, and returns, and more send requests are issued to the provider. It is possible for Thread A to remain busy enough processing send and receive completions from the provider that Thread B doesn't get to complete running. The send completion that Thread A is going to process references the send WR (struct _wr) at index 0. Thread B completes WR 1 and 2, issues 12 more requests uneventfully (up to WR 15), all the mean time processing completions so that send_cnt is less than the limit. The next send is the eventful one, because it uses WR at index 0 again - the WR that Thread B is currently processing. It overwrites the OVERLAPPED pointer in the WR structure at index 0. When this send completes, it will report the new OVERLAPPED value. If Thread B gets to run before Thread A completes this send, it will be marked completed though it is still being transferred by the HW. When the send completes, the overlapped will be marked complete again, potentially completing another send prematurely. The original OVERLAPPED is lost in this case, and never marked as complete.
There are several ways of fixing this, ranging from locking around complete_wq to redesigning the WR usage to not use a circular array so that entries aren't reused until they are complete.
-Fab
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20071218/d2579d48/attachment.html>
More information about the ofw
mailing list