[ofa-general] RE: CM goes to timewait state without waiting for disconnect reply
Sean Hefty
sean.hefty at intel.com
Thu Apr 17 11:51:36 PDT 2008
>What I see is that a connection request is coming from the client to the server
>And the server reply with reject - the reason for the reject is that a timewait
>structure
>already exists for this QPN. And that's because the client thinks that a
>connection is closed and reuse the QPN but the server didn't finish cleaning up
>the connection.
This is an unavoidable situation. There's no coordination between the timewait
states on different systems, so it's always possible for one to re-connect
before the other system has exited timewait.
However, in your case, the problem is that the client is trying to re-use the
QPN outside of knowing when it has exited the local timewait state. Instead,
have the client issue a DREQ, and then wait for the timewait state to exit
before trying to re-use the QPN.
This would then be the sequence:
client server
sends DREQ
enters timewait
sends DREP
enters timewait
exits timewait
destroy cm_id
new connection
Your hope at this point is that the server exits timewait before the client
will, while, likely, is not guaranteed.
- Sean
More information about the general
mailing list