[ofa-general] Bogus Receive Completions

Roman Kononov kononov at dls.net
Wed Jan 23 12:43:41 PST 2008


On 2008-01-22 01:49, Eli Cohen wrote:
> 
> I am sending two patches, one for userspace and one for kernel space
> which solves this issue.
> 

Thanks for the patches:
http://lists.openfabrics.org/pipermail/general/2008-January/045259.html
http://lists.openfabrics.org/pipermail/general/2008-January/045260.html

They "fix" the test program I sent to the list earlier. It ran for many 
hours. Unfortunately, they did not fix my convoluted software.

I applied the user space patch to libmthca-1.0.4 from OFED-1.2.5.4, and the 
kernel space patch to the 2.6.23.14 kernel. The user space patch did not 
want to apply one of the hunks (the one containing '- wbm();') to srq.c 
because the code being patched did not have the 'wbm();' line. This forced 
me to remove the '- wbm()' line from the patch file.

Then I observed these errors, each occurred twice so far:
- A send completion is out of order. It has a "future" wr_id value.
- A receive completion has a "future" imm_data value.

It looks exactly like if the sending side dropped a few 
IBV_WR_RDMA_WRITE_WITH_IMM requests. Or the sender sent them later (but my 
software does not known about them because it stops on the first error).

Is it possible that with IBV_QPT_RC queues, the IBV_WR_RDMA_WRITE_WITH_IMM 
requests are completed out of order on either sending or receiving side?

Roman



More information about the general mailing list