[openib-general] completion Q overflow error/panic

Viswanath Krishnamurthy viswa.krish at gmail.com
Fri Sep 9 15:30:05 PDT 2005


Somehow gmail ate away the main content of my mail..

Here it is..


I modified the cmpost program to have individual completion send/receive 
Q's. The mcpost
server acts like a echo server, echoing back anything it receives. The 
client program keeps sending
the packets.

The test works fine upto around 600 connections. After 600 connections, I 
start to see ibv_post_send errors
with. I added some debug messages in libmthca/src/qp.c where a check is made 
for wq_overflow. In fact
it is overflowing. I checked the code to make sure all the send descriptors 
are recovered with cq_poll operation. Also
the wc.status field is checked for any errors.
I am attaching the modified code . 

bash-3.00$ svn info
Path: .
URL: https://openib.org/svn/gen2/trunk
Repository UUID: 21a7a0b7-18d7-0310-8e21 -e8b31bdbf5cd
Revision: 3344
Node Kind: directory
Schedule: normal
Last Changed Author: jlentini
Last Changed Rev: 3344
Last Changed Date: 2005-09-08 16:39:25 -0700 (Thu, 08 Sep 2005)


To run the test compile the code 

cc -o cmpost cmpost.c -libcm -libverbs -libat

$ cmpost -n 1024 <=== as server

$ cmpost -c -n 1024 -l <dest_lid> -g <dest_guid>

After sometime you start seeing post_send errors. On my system upto 600 
connections work fine.


When running the test I saw panics couple of time. But difficult to 
reproduce

ernel BUG at include/asm/spinlock.h:149!
invalid operand: 0000 [#1]
SMP
Modules linked in: nfs nfsd exportfs lockd autofs4 sunrpc uhci_hcd ehci_hcd 
hw_random e1000 ext3 jbdsd_mod
CPU: 1
EIP: 0060:[<c02fef92>] Not tainted VLI
EFLAGS: 00010086 (2.6.13)
EIP is at _spin_lock_irqsave+0x47/0x51
eax: 00000011 ebx: 00000282 ecx: c035950c edx: 00000082
esi: f7d82010 edi: 00000000 ebp: f6792c80 esp: c1a33ed0
ds: 007b es: 007b ss: 0068
Process ib_mad1 (pid: 308, threadinfo=c1a32000 task=f7e3c540)
Stack: c03123ee c0276963 f6792c80 f7d82010 c0276963 f79a6adc f7974b00 
00000001
c1a33f0c f7912e00 f7df2000 f7df4200 c1a33f0c 00000292 c0276b96 f6792c80
00000000 00000000 00000000 b93e2c00 00000128 00000296 00000402 00000001
Call Trace:
[<c0276963>] ib_mad_send_done_handler+0x72/0x11e
[<c0276963>] ib_mad_send_done_handler+0x72/0x11e
[<c0276b96>] ib_mad_completion_handler+0x80/0x8d
[<c0120000>] wait_noreap_copyout+0x55/0xbe
[<c012bd0d>] worker_thread+0x1b0/0x23a
[<c02fdb43>] schedule+0x5d3/0xbdf
[<c0276b16>] ib_mad_completion_handler+0x0/0x8d
[<c011942d>] default_wake_function+0x0/0xc
[<c011942d>] default_wake_function+0x0/0xc
[<c012bb5d>] worker_thread+0x0/0x23a
[<c012f700>] kthread+0x8a/0xb2
[<c012f676>] kthread+0x0/0xb2
[<c0101cf9>] kernel_thread_helper+0x5/0xb
Code: 00 00 74 01 fb f3 90 80 3e 00 7e f9 fa eb e8 83 c4 08 89 d8 5b 5e c3 
8b 44 24 10 c7 04 24 ee 23 31 c0 89 44 24 04 e8 2f e7 e1 ff <0f> 0b 95 00 39 
1c 31 c0 eb c2 53 89 c3 83 ec 08 fa 81 78 04 ad



-Viswa
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20050909/afb1ea4d/attachment.html>


More information about the general mailing list