[ofa-general] IBV_WC_WR_FLUSH_ERR: first WQE only or all pending WQE's?
Scott Guthridge
guthridg at us.ibm.com
Fri Sep 21 11:17:05 PDT 2007
I have an application that has just posted a few sends on a connected RC
queue pair, when either the application itself modifies the QP state to
error, or the remote side goes into error. The *first* of these posted
send WQE's generates a CQE indicating IBV_WC_WR_FLUSH_ERR [or something
like IBV_WC_REM_OP_ERR, in the remote case] as I would expect. But the
remaining pending WQE's never seem to generate CQE's. [The ibv_post_send
operation did not give local errors on these, BTW.] As a result, my app.
hangs waiting for the pending operations to drain.
IB architecture spec. sections 9.9.2.3 and 9.9.2.4 seem to suggest that all
pending WQE's behind the failed request (error class B, I think) should
generate CQE's with the FLUSH error.
Questions:
(1) Do I understand the spec correctly? Should WQE's posted subsequently
to the one that is going to fail be generating FLUSH errors?
(2) Has anyone seen this behavior before? Is it common? [I haven't tried
switching hardware -- card I'm using *may* not be production level.] If it
*is* common behavior, I may need to recode my app. to mark all outstanding
requests as failed upon receiving the first error, and then ignore any
subsequent errors, to be defensive about it -- this seems kludgy, though,
and I'd rather not do that if I don't have to.
Thanks,
Scott
More information about the general
mailing list