I modified the cmpost program to have individual completion send/receive Q's. The mcpost<br>
server acts like a echo server, echoing back anything it receives. The client program keeps sending<br>
the packets.<br>
<br>
The test works fine upto around 600 connections. After 600 connections, I start to see ibv_post_send errors<br>
with. I added some debug messages in libmthca/src/qp.c where a check is made for wq_overflow. In fact<br>
it is overflowing. I checked the code to make sure all the send descriptors are recovered with cq_poll operation. Also<br>
the wc.status field is checked for any errors.<br>
I am attaching the modified code . <br>
<br>
bash-3.00$ svn info<br>
Path: .<br>
URL: <a href="https://openib.org/svn/gen2/trunk">https://openib.org/svn/gen2/trunk</a><br>
Repository UUID: 21a7a0b7-18d7-0310-8e21-e8b31bdbf5cd<br>
Revision: 3344<br>
Node Kind: directory<br>
Schedule: normal<br>
Last Changed Author: jlentini<br>
Last Changed Rev: 3344<br>
Last Changed Date: 2005-09-08 16:39:25 -0700 (Thu, 08 Sep 2005)<br>
<br>
<br>
To run the test compile the code <br>
<br>
cc -o cmpost cmpost.c -libcm -libverbs -libat<br>
<br>
$ cmpost -n 1024 <=== as server<br>
<br>
$ cmpost -c -n 1024 -l <dest_lid> -g <dest_guid><br>
<br>
After sometime you start seeing post_send errors. On my system upto 600 connections work fine.<br>
<br>
<br>
When running the test I saw panics couple of time. But difficult to reproduce<br>
<br>
ernel BUG at include/asm/spinlock.h:149!<br>
invalid operand: 0000 [#1]<br>
SMP<br>
Modules linked in: nfs nfsd exportfs lockd autofs4 sunrpc uhci_hcd ehci_hcd hw_random e1000 ext3 jbdsd_mod<br>
CPU: 1<br>
EIP: 0060:[<c02fef92>] Not tainted VLI<br>
EFLAGS: 00010086 (2.6.13)<br>
EIP is at _spin_lock_irqsave+0x47/0x51<br>
eax: 00000011 ebx: 00000282 ecx: c035950c edx: 00000082<br>
esi: f7d82010 edi: 00000000 ebp: f6792c80 esp: c1a33ed0<br>
ds: 007b es: 007b ss: 0068<br>
Process ib_mad1 (pid: 308, threadinfo=c1a32000 task=f7e3c540)<br>
Stack: c03123ee c0276963 f6792c80 f7d82010 c0276963 f79a6adc f7974b00 00000001<br>
c1a33f0c f7912e00 f7df2000 f7df4200 c1a33f0c 00000292 c0276b96 f6792c80<br>
00000000 00000000 00000000 b93e2c00 00000128 00000296 00000402 00000001<br>
Call Trace:<br>
[<c0276963>] ib_mad_send_done_handler+0x72/0x11e<br>
[<c0276963>] ib_mad_send_done_handler+0x72/0x11e<br>
[<c0276b96>] ib_mad_completion_handler+0x80/0x8d<br>
[<c0120000>] wait_noreap_copyout+0x55/0xbe<br>
[<c012bd0d>] worker_thread+0x1b0/0x23a<br>
[<c02fdb43>] schedule+0x5d3/0xbdf<br>
[<c0276b16>] ib_mad_completion_handler+0x0/0x8d<br>
[<c011942d>] default_wake_function+0x0/0xc<br>
[<c011942d>] default_wake_function+0x0/0xc<br>
[<c012bb5d>] worker_thread+0x0/0x23a<br>
[<c012f700>] kthread+0x8a/0xb2<br>
[<c012f676>] kthread+0x0/0xb2<br>
[<c0101cf9>] kernel_thread_helper+0x5/0xb<br>
Code: 00 00 74 01 fb f3 90 80 3e 00 7e f9 fa eb e8 83 c4 08 89 d8 5b 5e
c3 8b 44 24 10 c7 04 24 ee 23 31 c0 89 44 24 04 e8 2f e7 e1 ff
<0f> 0b 95 00 39 1c 31 c0 eb c2 53 89 c3 83 ec 08 fa 81 78 04 ad<br>
<br>
<br>
<br>
-Viswa<br>
<br>
<br>
<br>