[ofa-general] Bogus Receive Completions
Roman Kononov
ofed at kononov.ftml.net
Fri Sep 5 13:07:34 PDT 2008
This is continuation of
http://lists.openfabrics.org/pipermail/general/2007-December/043658.html
Basically, I have two processes on different computers talking to each other
over a single QP per process. They both post and receive
IBV_WR_RDMA_WRITE_WITH_IMM commands.
All Send Work Requests are sequentially numbered in wr_id field. When the
process receives Send Work Completion, wr_id is checked for consistency with
the posted number. So far so good.
All Receive Work Requests are sequentially numbered in wr_id field as well.
When the process gets a Receive Work Completion, wr_id is checked for
consistency with the posted number. The consistency test eventually fails.
The Completion status is "success", wr_id is out of order.
I believe that wr_id from Receive Work Completions must arrive in order, but
they do not.
I managed to reproduce the failure reliably in my environment. Then I
modified mthca_tavor_post_recv(), mthca_tavor_post_send() to print all
wr->wr_id values passing through them, and I modified mthca_poll_cq() to
print all valid wc->wr_id values passing through it. The results from the
two processes are attached. In stdout.1.log, one can see that a Receive Work
Request with wr_id=0x7f was accepted and immediately completed, while the
Receive Queue has 0x7f-0x40=0x3f uncompleted Work Requests. None
mthca_tavor_post_recv() calls returned an error.
This looks like a bug in libmthca or the firmware. I really need this fixed.
Where should go from this point? Any suggestions are appreciated.
The QP is created with both SQ and RQ sizes set to 64, with a single CQ. The
CQ size is set to 128.
I have libibverbs-1.1.2 and libmthca-1.0.5 compiled from sources.
~>cat /etc/issue
CentOS release 5.2 (Final)
Kernel \r on an \m
~>uname -a
Linux node100 2.6.26.3 #1 SMP PREEMPT Wed Sep 3 14:11:03 CDT 2008 x86_64
x86_64 x86_64 GNU/Linux
~>grep 'model name' /proc/cpuinfo
model name : Dual Core AMD Opteron(tm) Processor 285
model name : Dual Core AMD Opteron(tm) Processor 285
~>ibv_devinfo
hca_id: mthca0
fw_ver: 4.8.200
node_guid: 0002:c902:0026:dbe0
sys_image_guid: 0002:c902:0026:dbe3
vendor_id: 0x02c9
vendor_part_id: 25208
hw_ver: 0xA0
board_id: MT_02F0110002
phys_port_cnt: 2
...
Thanks,
Roman Kononov
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: stdout.1.log
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080905/3c346f33/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: stdout.2.log
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080905/3c346f33/attachment-0001.ksh>
More information about the general
mailing list