[ofa-general] RE: some questions on stale connection handling at the IB CM

Sean Hefty sean.hefty at intel.com
Mon Dec 17 11:57:59 PST 2007


>Looking on the code, I see that the CM sends a reject message with the reason
>being
>IB_CM_REJ_STALE_CONN when it gets a REQ or REP whose <QPN, CA GUID> pair is
>already
>present at the remote-qpn rb tree (and in another case which I don't fully
>understand).

IB_CM_REJ_STALE_CONN is sent in the following situations:

* Remote ID in REQ matches a connection that is in timewait.  This is treated as
a duplicate REQ that was processed after the connection had been terminated.

* Remote QPN in REQ or REP matches an existing connection, and REQ/REP was not
detected as a duplicate.

>On the other side, when the CM receives a reject message with that reason, the
>local handle
>(id) is moved to the timewait state, where my understanding is that it will sit
>there for a while
>and then a reject/stale-connection callback will be delivered to the user, the
>id will be removed.

correct

>What I don't see is issue of "DREQ, with DREQ:remote QPN set to the remote QPN
>from the REQ/REP"
>as stated in 12.9.8.3.1 (below), is it really missing or I am reading the code
>wrong?

This is missing.  But neither the DREQ or DREP that are generated in this case
drive the state machines.  Both messages are simply generated and then consumed
by the CM.  (I don't even think it's clear if the local and remote IDs in the
DREQ/DREP are relative to the stale connection, or the new connection
request/reply.)

>Also, its quite clear to me that from the view point of the application there
>are stale
>connection cases which the CM can not catch, eg a client DREQ did not arrive to
>the server
>side CM and then the client REQ uses a different QPN, etc. My understanding is
>that in such
>cases the responsibility to close the stale connection/qp is on the server app.

Correct - keep-alive messages are still needed by apps to know if their
connections are still valid.  IMO, stale connection detection becomes less
useful as the number of systems being connected to increase.

- Sean





More information about the general mailing list