<div>Roman,</div>
<div>if your software creates QPs whose receive queue size is not a power of two then you might experience weired problems as the patches I sent have a bug. I am sending a patch to be applied on top of the previous libmthca patch so you can try it (the same fix goes for the kernel code too). Tomorow I will send the fixed patches again. I apologize if the patch is badly formed.
</div>
<div> </div>
<div>
<p> src/qp.c | 2 +-<br> 1 files changed, 1 insertions(+), 1 deletions(-)</p>
<p>diff --git a/src/qp.c b/src/qp.c<br>index f3aa6c7..3c5f049 100644<br>--- a/src/qp.c<br>+++ b/src/qp.c<br>@@ -885,7 +885,7 @@ int mthca_alloc_qp_buf(struct ibv_pd *pd, struct ibv_qp_cap *cap,<br> } else {<br> for (i = 0; i < qp->
rq.max; ++i) {<br> next = get_recv_wqe(qp, i);<br>- next->nda_op = htonl((((i + 1) & (qp->rq.max - 1)) <<<br>+ next->nda_op = htonl((((i + 1) % qp->
rq.max) <<<br> qp->rq.wqe_shift) | 1);<br> }<br> }<br></p><br><br> </div>
<div><span class="gmail_quote">On 1/23/08, <b class="gmail_sendername">Roman Kononov</b> <<a href="mailto:kononov@dls.net">kononov@dls.net</a>> wrote:</span>
<blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0px 0px 0px 0.8ex; BORDER-LEFT: #ccc 1px solid">On 2008-01-22 01:49, Eli Cohen wrote:<br>><br>> I am sending two patches, one for userspace and one for kernel space
<br>> which solves this issue.<br>><br><br>Thanks for the patches:<br><a href="http://lists.openfabrics.org/pipermail/general/2008-January/045259.html">http://lists.openfabrics.org/pipermail/general/2008-January/045259.html
</a><br><a href="http://lists.openfabrics.org/pipermail/general/2008-January/045260.html">http://lists.openfabrics.org/pipermail/general/2008-January/045260.html</a><br><br>They "fix" the test program I sent to the list earlier. It ran for many
<br>hours. Unfortunately, they did not fix my convoluted software.<br><br>I applied the user space patch to libmthca-1.0.4 from OFED-1.2.5.4, and the<br>kernel space patch to the <a href="http://2.6.23.14">2.6.23.14</a> kernel. The user space patch did not
<br>want to apply one of the hunks (the one containing '- wbm();') to srq.c<br>because the code being patched did not have the 'wbm();' line. This forced<br>me to remove the '- wbm()' line from the patch file.
<br><br>Then I observed these errors, each occurred twice so far:<br>- A send completion is out of order. It has a "future" wr_id value.<br>- A receive completion has a "future" imm_data value.<br><br>
It looks exactly like if the sending side dropped a few<br>IBV_WR_RDMA_WRITE_WITH_IMM requests. Or the sender sent them later (but my<br>software does not known about them because it stops on the first error).<br><br>Is it possible that with IBV_QPT_RC queues, the IBV_WR_RDMA_WRITE_WITH_IMM
<br>requests are completed out of order on either sending or receiving side?<br><br>Roman<br></blockquote></div><br>