[openib-general] in need of a simple ulp
Jeff Carr
jcarr at linuxmachines.com
Fri May 20 13:02:48 PDT 2005
Sean Hefty wrote:
> Jeff Carr wrote:
>> May 5 16:31:50 localhost kernel: ib_mthca 0000:09:00.0: 1a0084/0:
>> error CQE -> QPN 1a0406, WQE @ 00000042
>> May 5 16:31:50 localhost kernel: [ 0] 001a0406
>> May 5 16:31:50 localhost kernel: [ 4] 00001aed
>> May 5 16:31:50 localhost kernel: [ 8] 00000004
>> May 5 16:31:50 localhost kernel: [ c] 00003800
>> May 5 16:31:50 localhost kernel: [10] 128a0000
>> May 5 16:31:50 localhost kernel: [14] 00000000
>> May 5 16:31:50 localhost kernel: [18] 00000042
>> May 5 16:31:50 localhost kernel: [1c] ff000000
>>
>> if you up the message_count to 0x1000. I'm guessing this is just some
>> normal overrun error though.
>
>
> It's taken me a while to look at this, but I think that this is a real
> error.
There must also be some limit to how many cqe's you can allocate with
ib_post_recv(). (?)
> Cmpost is setting the CQ size too small, which can lead to the CQ
> overrun. The number of cqe's should have been message_count * 2, rather
> than just message_count. Message_count is fine on the client side,
> which receives all messages before sending. But on the server side,
> receives could begin coming in before all sends are done.
OK. Wow. That makes cqe's and ib_post_recv() even more confusing then.
There must be some way to delete/free these? They don't get re-used I
take it? Surely it wasn't intended that ib_post_recv() be initially run
for each transfer expected in the lifetime of the connection. :)
There must also be some information about what is known about these
cqe's. How do we know if one of them was used for a transfer from the
server to the client or from the client letting the server know the
transfer was recieved?
I know that this isn't a CM question; but this question is best asked
against this code simplicity. (Simplicity is good)
Jeff
More information about the general
mailing list