[ofa-general] IBV_WC_WR_FLUSH_ERR: first WQE only or all pending WQE's?

Scott Guthridge guthridg at us.ibm.com
Fri Sep 21 11:17:05 PDT 2007


I have an application that has just posted a few sends on a connected RC
queue pair, when either the application itself modifies the QP state to
error, or the remote side goes into error.  The *first* of these posted
send WQE's generates a CQE indicating IBV_WC_WR_FLUSH_ERR [or something
like IBV_WC_REM_OP_ERR, in the remote case] as I would expect.  But the
remaining pending WQE's never seem to generate CQE's.  [The ibv_post_send
operation did not give local errors on these, BTW.]  As a result, my app.
hangs waiting for the pending operations to drain.

IB architecture spec. sections 9.9.2.3 and 9.9.2.4 seem to suggest that all
pending WQE's behind the failed request (error class B, I think) should
generate CQE's with the FLUSH error.

Questions:

(1) Do I understand the spec correctly?  Should WQE's posted subsequently
to the one that is going to fail be generating FLUSH errors?

(2) Has anyone seen this behavior before?  Is it common?  [I haven't tried
switching hardware -- card I'm using *may* not be production level.]  If it
*is* common behavior, I may need to recode my app. to mark all outstanding
requests as failed upon receiving the first error, and then ignore any
subsequent errors, to be defensive about it -- this seems kludgy, though,
and I'd rather not do that if I don't have to.


Thanks,
Scott




More information about the general mailing list