[openib-general] in need of a simple ulp

Jeff Carr jcarr at linuxmachines.com
Fri May 20 13:02:48 PDT 2005


Sean Hefty wrote:
> Jeff Carr wrote:
>> May  5 16:31:50 localhost kernel: ib_mthca 0000:09:00.0: 1a0084/0: 
>> error CQE -> QPN 1a0406, WQE @ 00000042
>> May  5 16:31:50 localhost kernel:   [ 0] 001a0406
>> May  5 16:31:50 localhost kernel:   [ 4] 00001aed
>> May  5 16:31:50 localhost kernel:   [ 8] 00000004
>> May  5 16:31:50 localhost kernel:   [ c] 00003800
>> May  5 16:31:50 localhost kernel:   [10] 128a0000
>> May  5 16:31:50 localhost kernel:   [14] 00000000
>> May  5 16:31:50 localhost kernel:   [18] 00000042
>> May  5 16:31:50 localhost kernel:   [1c] ff000000
>>
>> if you up the message_count to 0x1000. I'm guessing this is just some 
>> normal overrun error though.
> 
> 
> It's taken me a while to look at this, but I think that this is a real 
> error.

There must also be some limit to how many cqe's you can allocate with 
ib_post_recv(). (?)

> Cmpost is setting the CQ size too small, which can lead to the CQ 
> overrun. The number of cqe's should have been message_count * 2, rather 
> than just message_count.  Message_count is fine on the client side, 
> which receives all messages before sending.  But on the server side, 
> receives could begin coming in before all sends are done.

OK. Wow. That makes cqe's and ib_post_recv() even more confusing then.

There must be some way to delete/free these? They don't get re-used I 
take it? Surely it wasn't intended that ib_post_recv() be initially run 
for each transfer expected in the lifetime of the connection. :)

There must also be some information about what is known about these 
cqe's. How do we know if one of them was used for a transfer from the 
server to the client or from the client letting the server know the 
transfer was recieved?

I know that this isn't a CM question; but this question is best asked 
against this code simplicity. (Simplicity is good)

Jeff



More information about the general mailing list