[openib-general] in need of a simple ulp

Sean Hefty mshefty at ichips.intel.com
Mon May 9 14:48:13 PDT 2005


Jeff Carr wrote:
> I didn't notice this error. In any case, it was something I did wrong; I 
> went back and did a simple check with your code and it is ok. I do 
> notice though that you can generate:
> 
> May  5 16:31:50 localhost kernel: ib_mthca 0000:09:00.0: 1a0084/0: error 
> CQE -> QPN 1a0406, WQE @ 00000042
> May  5 16:31:50 localhost kernel:   [ 0] 001a0406
> May  5 16:31:50 localhost kernel:   [ 4] 00001aed
> May  5 16:31:50 localhost kernel:   [ 8] 00000004
> May  5 16:31:50 localhost kernel:   [ c] 00003800
> May  5 16:31:50 localhost kernel:   [10] 128a0000
> May  5 16:31:50 localhost kernel:   [14] 00000000
> May  5 16:31:50 localhost kernel:   [18] 00000042
> May  5 16:31:50 localhost kernel:   [1c] ff000000
> 
> if you up the message_count to 0x1000. I'm guessing this is just some 
> normal overrun error though.

It's taken me a while to look at this, but I think that this is a real error.

Cmpost is setting the CQ size too small, which can lead to the CQ overrun. 
The number of cqe's should have been message_count * 2, rather than just 
message_count.  Message_count is fine on the client side, which receives all 
messages before sending.  But on the server side, receives could begin 
coming in before all sends are done.

Thanks for the info.  I've submitted a change that should fix this.

- Sean



More information about the general mailing list