[openib-general] [PATCH ] RFC IB/cm do not track remote QPN in timewait state

Mon Aug 28 06:32:50 PDT 2006

IB spec, section 12.4, says:

	CMs shall maintain enough connection state information to detect an attempt
	to initiate a connection on a remote QP/EEC that has not been released
	from a connection with a local QP/EEC, or that is in the TimeWait
	state. Such an event could occur if the remote CM had dropped the connection
	and sent DREQ, but the DREQ was not received by the local CM.
	If the local CM receives a REQ that includes a QPN (or EECN if
	REQ:RDC Exists is not set), that it believes to be connected to a local
	QP/EEC, the local CM shall act as defined in section 12.9.8.3.

Note here, that while CM must maintain QPs in TimeWait state (to enable
detection of TimeWait packets, as explained in 9.7.1 PACKET SEQUENCE NUMBERS),
such QPs are not connected (they are normally in reset state).
Thus even if a local QP was connected to a specific remote QPN, once the
connection enters the timewait state CM must not reject the connection request
even if it includes the specific remote QPN.

The bahaviour decribed in 12.9.8.3 is as follows:

	12.9.8.3.1 REQ RECEIVED / REP RECEIVED
	(RC, UC) A CM may receive a REQ/REP specifying a remote QPN in
	.REQ:local QPN./.REP:local QPN. that the CM already considers connected
	to a local QP. A local CM may receive such a REQ/REP if its local
	QP has a stale connection, as described in section 12.4.1. When a CM
	receives such a REQ/REP it shall abort the connection establishment by
	issuing REJ to the REQ/REP. It shall then issue DREQ, with .DREQ:remote
	QPN. set to the remote QPN from the REQ/REP, until DREP is received
	or Max Retries is exceeded, and place the local QP in the
	TimeWait state.

	....


	If a CM receives a REQ/REP as described above, if the REQ/REP has the
	same Local Communication ID and Remote Communication ID as are
	present in the existing connection and if the REQ/REP arrives within the
	window of time during which the active/passive side could be legally
	retransmitting
	REQ/REP, the CM should treat the REQ/REP as a retry and
	not initiate stale connection processing as described above.

Not how all this does not make any sense for connections in timewait state.

Finally, let me quote the definition of the stale connection:

	12.4.1 STALE CONNECTION
	A QP/EEC is said to have a stale connection when only one side has connection
	information. A stale connection may result if the remote CM had
	dropped the connection and sent a DREQ but the DREQ was never received
	by the local CM. Alternatively the remote CM may have lost all
	record of past connections because its node crashed and rebooted, while
	the local CM did not become aware of the remote node's reboot and therefore
	did not clean up stale connections.

Note how, again, a connection in TimeWait state does not match the
definition of the Stale connection since we arrive there after
graceful DREQ/DREP exchange.

Our CM implementation violates this requirement - even after the connection
was torn down gracefully, and after QP was moved to timewait,
CM still rejects connection requests that happen to share the same
remote QPN, until timewait exit.

I actually see a lot of such bogus rejects when QPs are open/closed at a high
rate. The following patch addresses this issue for me, but also seems
to trigger crashes under stress - I am still debugging these.
Comments appreciated.

---

IB/cm: do not track remote QPN in TimeWait, since QP is not connected

Signed-off-by: Michael S. Tsirkin <mst at mellanox.co.il>

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index f85c97f..e270311 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -679,6 +679,8 @@ static void cm_enter_timewait(struct cm_
 {
 	int wait_time;
 
+	cm_cleanup_timewait(cm_id_priv->timewait_info);
+
 	/*
 	 * The cm_id could be destroyed by the user before we exit timewait.
 	 * To protect against this, we search for the cm_id after exiting

-- 
MST