[ofa-general] RE: CM goes to timewait state without waiting for disconnect reply

Sean Hefty sean.hefty at intel.com
Thu Apr 17 11:51:36 PDT 2008


>What I see is that a connection request is coming from the client to the server
>And the server reply with reject - the reason for the reject is that a timewait
>structure
>already exists for this QPN. And that's because the client thinks that a
>connection is closed and reuse the QPN but the server didn't finish cleaning up
>the connection.

This is an unavoidable situation.  There's no coordination between the timewait
states on different systems, so it's always possible for one to re-connect
before the other system has exited timewait.

However, in your case, the problem is that the client is trying to re-use the
QPN outside of knowing when it has exited the local timewait state.  Instead,
have the client issue a DREQ, and then wait for the timewait state to exit
before trying to re-use the QPN.

This would then be the sequence:

client		server
sends DREQ
			enters timewait
			sends DREP
enters timewait
exits timewait
destroy cm_id
new connection

Your hope at this point is that the server exits timewait before the client
will, while, likely, is not guaranteed.

- Sean




More information about the general mailing list