[ofa-general] Questions about IPOIB handle last WQE event

Roland Dreier rdreier at cisco.com
Tue Jul 22 21:57:26 PDT 2008


 > Section: 11-5.2.5: 
 > ------------------
 > If the HCA supports SRQ, for RC and UD service, the CI shall generate a
 > Last WQE Reached Affiliated Asynchronous Event on a QP that is in the
 > Error State and is associated with an SRQ when either:
 > • a CQE is generated for the last WQE, or
 > • the QP gets in the Error State and there are no more WQEs on the
 >   RQ.
 > 
 > I thought the ehca follows this "the QP gets in the Error State and
 > there are no more WQEs on the RQ." so a CQE is not generated for the
 > last WQE.

I don't follow -- we transition the QP to error.  The HCA generates a
"last WQE reached" event.  Then we post a send request to the QP (which
is now in the error state).  This send request must complete with a
flush error status.

There's nothing tricky about this, and there's nothing related to RQ/SRQ
processing or the last WQE reached event.  If ehca does not complete
send requests posted to a QP in the error state with a flush error, then
there is a bug somewhere in the ehca driver/firmware/hardware.

 > In this case, the send side TX QP is already gone. So there shouldn't be
 > any send executed from remote to this RX QP. And there is no harmful to
 > deliver any outstanding CQEs in CQ to consumer. So it's OK not putting
 > QP in error status before posting last WR, right? Any IB spec specifies
 > somewhere it's a MUST?

I guess it might work, but how do we avoid leaking receive work requests
if we don't transition the QP to error at some point?  There are cases
where we might be garbage collecting an unused connection and race with
an incoming message.  Why wouldn't we want to make everything simple and
transition to the error state?

 > Sorry my question was not clear. It should be whether it's possible to
 > have two last WQE reached event on the same QP: one is from consumer
 > setting the QP to error status, one is from driver/FW/HW catastrophic
 > error if any?

A catastrophic error should be reported as a catastrophic error, not a
last WQE reached event.  Of course it is possible to get multiple last
WQE reached events for a single QP, by transitioning the QP out of the
error state and then back into the error state.  But IPoIB wouldn't do
that.

 - R.



More information about the general mailing list