[ofa-general] Bogus Receive Completions

Roman Kononov ofed at kononov.ftml.net
Fri Sep 5 16:11:05 PDT 2008


On 2008-09-05 17:39, Roland Dreier wrote:
>  > I managed to reproduce the failure reliably in my environment.
> 
> Can you provide the code to reproduce this?  I'd like to try it on
> ConnectX to see if it is HCA-dependent.

Perhaps, I can give you the code, but it needs lots of other HW and SW. 
Setting it up will be a big pain. And, by changing the code and moving stuff 
around, I can almost mask the problem, and it does not appear that soon.

> 
> Also I would suggest raising this issue with whoever sold you the HCAs.

What do you mean? Do you mean that the HCAs could be defective? The 
manufacturer is HP. The retailer is somebody. I'm sure, that money back is 
the most what I can get from them.

BTW, I added more printfs in mthca_poll_one(), when it handles Receive 
Completions, and have noticed that cqe->wqe is out of sequence:

...
wr_id=3a, is_error=0, wqe=e81, wqe_index=3a, cqe=0x84a7c0, imm=8000003a
wr_id=3b, is_error=0, wqe=ec1, wqe_index=3b, cqe=0x84a7e0, imm=8000003b
wr_id=3c, is_error=0, wqe=f01, wqe_index=3c, cqe=0x84a800, imm=8000003c
wr_id=3d, is_error=0, wqe=f41, wqe_index=3d, cqe=0x84a820, imm=8000003d
wr_id=3e, is_error=0, wqe=f81, wqe_index=3e, cqe=0x84a840, imm=8000003e
wr_id=3f, is_error=0, wqe=fc1, wqe_index=3f, cqe=0x84a860, imm=8000003f
wr_id=7f, is_error=0, wqe=fc1, wqe_index=3f, cqe=0x84a880, imm=80000040

"imm" is really cqe->imm_etype_pkey_eec, and it comes from the sender and is 
in sequence.

Roman



More information about the general mailing list