[openib-general] in need of a simple ulp
Sean Hefty
mshefty at ichips.intel.com
Mon May 9 14:48:13 PDT 2005
Jeff Carr wrote:
> I didn't notice this error. In any case, it was something I did wrong; I
> went back and did a simple check with your code and it is ok. I do
> notice though that you can generate:
>
> May 5 16:31:50 localhost kernel: ib_mthca 0000:09:00.0: 1a0084/0: error
> CQE -> QPN 1a0406, WQE @ 00000042
> May 5 16:31:50 localhost kernel: [ 0] 001a0406
> May 5 16:31:50 localhost kernel: [ 4] 00001aed
> May 5 16:31:50 localhost kernel: [ 8] 00000004
> May 5 16:31:50 localhost kernel: [ c] 00003800
> May 5 16:31:50 localhost kernel: [10] 128a0000
> May 5 16:31:50 localhost kernel: [14] 00000000
> May 5 16:31:50 localhost kernel: [18] 00000042
> May 5 16:31:50 localhost kernel: [1c] ff000000
>
> if you up the message_count to 0x1000. I'm guessing this is just some
> normal overrun error though.
It's taken me a while to look at this, but I think that this is a real error.
Cmpost is setting the CQ size too small, which can lead to the CQ overrun.
The number of cqe's should have been message_count * 2, rather than just
message_count. Message_count is fine on the client side, which receives all
messages before sending. But on the server side, receives could begin
coming in before all sends are done.
Thanks for the info. I've submitted a change that should fix this.
- Sean
More information about the general
mailing list