[openib-general] [PATCH ] RFC IB/cm do not track remote QPN in timewait state
Michael S. Tsirkin
mst at mellanox.co.il
Mon Aug 28 06:32:50 PDT 2006
IB spec, section 12.4, says:
CMs shall maintain enough connection state information to detect an attempt
to initiate a connection on a remote QP/EEC that has not been released
from a connection with a local QP/EEC, or that is in the TimeWait
state. Such an event could occur if the remote CM had dropped the connection
and sent DREQ, but the DREQ was not received by the local CM.
If the local CM receives a REQ that includes a QPN (or EECN if
REQ:RDC Exists is not set), that it believes to be connected to a local
QP/EEC, the local CM shall act as defined in section 12.9.8.3.
Note here, that while CM must maintain QPs in TimeWait state (to enable
detection of TimeWait packets, as explained in 9.7.1 PACKET SEQUENCE NUMBERS),
such QPs are not connected (they are normally in reset state).
Thus even if a local QP was connected to a specific remote QPN, once the
connection enters the timewait state CM must not reject the connection request
even if it includes the specific remote QPN.
The bahaviour decribed in 12.9.8.3 is as follows:
12.9.8.3.1 REQ RECEIVED / REP RECEIVED
(RC, UC) A CM may receive a REQ/REP specifying a remote QPN in
.REQ:local QPN./.REP:local QPN. that the CM already considers connected
to a local QP. A local CM may receive such a REQ/REP if its local
QP has a stale connection, as described in section 12.4.1. When a CM
receives such a REQ/REP it shall abort the connection establishment by
issuing REJ to the REQ/REP. It shall then issue DREQ, with .DREQ:remote
QPN. set to the remote QPN from the REQ/REP, until DREP is received
or Max Retries is exceeded, and place the local QP in the
TimeWait state.
....
If a CM receives a REQ/REP as described above, if the REQ/REP has the
same Local Communication ID and Remote Communication ID as are
present in the existing connection and if the REQ/REP arrives within the
window of time during which the active/passive side could be legally
retransmitting
REQ/REP, the CM should treat the REQ/REP as a retry and
not initiate stale connection processing as described above.
Not how all this does not make any sense for connections in timewait state.
Finally, let me quote the definition of the stale connection:
12.4.1 STALE CONNECTION
A QP/EEC is said to have a stale connection when only one side has connection
information. A stale connection may result if the remote CM had
dropped the connection and sent a DREQ but the DREQ was never received
by the local CM. Alternatively the remote CM may have lost all
record of past connections because its node crashed and rebooted, while
the local CM did not become aware of the remote node's reboot and therefore
did not clean up stale connections.
Note how, again, a connection in TimeWait state does not match the
definition of the Stale connection since we arrive there after
graceful DREQ/DREP exchange.
Our CM implementation violates this requirement - even after the connection
was torn down gracefully, and after QP was moved to timewait,
CM still rejects connection requests that happen to share the same
remote QPN, until timewait exit.
I actually see a lot of such bogus rejects when QPs are open/closed at a high
rate. The following patch addresses this issue for me, but also seems
to trigger crashes under stress - I am still debugging these.
Comments appreciated.
---
IB/cm: do not track remote QPN in TimeWait, since QP is not connected
Signed-off-by: Michael S. Tsirkin <mst at mellanox.co.il>
diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index f85c97f..e270311 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -679,6 +679,8 @@ static void cm_enter_timewait(struct cm_
{
int wait_time;
+ cm_cleanup_timewait(cm_id_priv->timewait_info);
+
/*
* The cm_id could be destroyed by the user before we exit timewait.
* To protect against this, we search for the cm_id after exiting
--
MST
More information about the general
mailing list