[openib-general] [PATCH] add cq error events

Thu Sep 22 16:56:12 PDT 2005

Going below the specmanship, the source of ambiguity comes
down to whether the RDMA device checks the consume pointer
before writing a CQE.

Not checking it means that overflow is either undetectable, or
only detected after arbitrary unknown CQEs have been erased.
In the case where an unknown CQE was erased every QP
that feeds the CQ is at risk.

But if the RDMA device checks the consume pointer before
writing then the only CQE that can be lost is the one that
is being generated. That QP is known. It is known that no
other QPs have been damaged.

The two designs reflect different approaches to fault tolerance.
One states a constraint on the application, which if followed
can prevent CQ overflows. Since any CQ overflow represents
a failure of the Consumer to comply with the contract the
RDMA device is under no obligation to waste a single
flip-flop or line of code to try to minimize the damage,
except for damage to third parties (hence the RDMAC
constraint that QPs using different CQs are not damaged).

The second views a CQ overflow on the same terms
as a divide by zero or many other errors that should
not happen -- you confine the damage and leave as
much of the system running as possible.

Given that both design approaches are valid it is not
surprising that both IB and iWARP verb specifications
an be construed to be compatible with either design.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20050922/60bc617d/attachment.html>