[ofa-general] Bogus Receive Completions

Eli Cohen eli at dev.mellanox.co.il
Wed Jan 23 14:12:46 PST 2008


Roman,
if your software creates QPs whose receive queue size is not a power of two
then you might experience weired problems as the patches I sent have a bug.
I am sending a patch to be applied on top of the previous libmthca patch so
you can try it (the same fix goes for the kernel code too). Tomorow I will
send the fixed patches again. I apologize if the patch is badly formed.


 src/qp.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/src/qp.c b/src/qp.c
index f3aa6c7..3c5f049 100644
--- a/src/qp.c
+++ b/src/qp.c
@@ -885,7 +885,7 @@ int mthca_alloc_qp_buf(struct ibv_pd *pd, struct
ibv_qp_cap *cap,
        } else {
                for (i = 0; i < qp->rq.max; ++i) {
                        next = get_recv_wqe(qp, i);
-                       next->nda_op = htonl((((i + 1) & (qp->rq.max - 1))
<<
+                       next->nda_op = htonl((((i + 1) % qp->rq.max) <<
                                             qp->rq.wqe_shift) | 1);
                }
        }



On 1/23/08, Roman Kononov <kononov at dls.net> wrote:
>
> On 2008-01-22 01:49, Eli Cohen wrote:
> >
> > I am sending two patches, one for userspace and one for kernel space
> > which solves this issue.
> >
>
> Thanks for the patches:
> http://lists.openfabrics.org/pipermail/general/2008-January/045259.html
> http://lists.openfabrics.org/pipermail/general/2008-January/045260.html
>
> They "fix" the test program I sent to the list earlier. It ran for many
> hours. Unfortunately, they did not fix my convoluted software.
>
> I applied the user space patch to libmthca-1.0.4 from OFED-1.2.5.4, and
> the
> kernel space patch to the 2.6.23.14 kernel. The user space patch did not
> want to apply one of the hunks (the one containing '- wbm();') to srq.c
> because the code being patched did not have the 'wbm();' line. This forced
> me to remove the '- wbm()' line from the patch file.
>
> Then I observed these errors, each occurred twice so far:
> - A send completion is out of order. It has a "future" wr_id value.
> - A receive completion has a "future" imm_data value.
>
> It looks exactly like if the sending side dropped a few
> IBV_WR_RDMA_WRITE_WITH_IMM requests. Or the sender sent them later (but my
> software does not known about them because it stops on the first error).
>
> Is it possible that with IBV_QPT_RC queues, the IBV_WR_RDMA_WRITE_WITH_IMM
> requests are completed out of order on either sending or receiving side?
>
> Roman
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080124/ffe3655a/attachment.html>


More information about the general mailing list