[openib-general] completion Q overflow error/panic

Viswanath Krishnamurthy viswa.krish at gmail.com
Sat Sep 10 10:53:28 PDT 2005


Here is ibv_devinfo output. It is InfiniHost_III_Lx0

]# ibv_devinfo
hca_id: mthca0
fw_ver: 1.0.1
node_guid: 0002:c902:0040:0cfc
sys_image_guid: 0002:c902:0040:0cff
max_mr_size: 0xffffffffffffffff
page_size_cap: 0x0
vendor_id: 0x02c9
vendor_part_id: 25204
hw_ver: 0x0
phys_port_cnt: 1
port: 1
state: PORT_ACTIVE (4)
max_mtu: invalid MTU (0)
active_mtu: invalid MTU (0)
sm_lid: 1
port_lid: 1
port_lmc: 0x00


Yes the CQE is a bug. But in this case at any time there should be one
outstanding packet in the pipe. The client sends 1 packet, waits for 
response with a 
pause (delay), then sends the next packet. If everything works, we should be
using atmost 1 cq entry. Initially I had more number of CQ entries, but the 
problem
appeared later.

Looks like the packet is getting stuck somewhere, with no notification back 
of any error. Do we need to tweak any of the QP parameters ? (packet life 
time, retries etc) ?

-Viswa




On 9/9/05, Roland Dreier <rolandd at cisco.com> wrote:
> 
> I found one bug in your cmpost.c program that could cause CQ
> overruns. When you create your receive and send CQs, you create them
> with a cqe value of 5, so they can hold at most 5 entries. However,
> you create the send and receive work queues so they can hold up to 10
> entries, and in fact the code will post up to 8 entries at a time. So
> it's possible to overflow the CQ.
> 
> The fix is to create the CQs to have at least as many entries as the
> work queues -- in other words, change cqe to 10.
> 
> However, even with this fixed I do see some strange behavior that I'm
> still debugging. More details on Monday.
> 
> What HCA firmware version do your systems have?
> 
> - R.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20050910/08d91247/attachment.html>


More information about the general mailing list