[ofa-general] RE: How fast to get RDMA_CM_EVENT_DISCONNECTED ?
Roland Dreier
rdreier at cisco.com
Wed Apr 11 20:48:17 PDT 2007
> Yes, Internally in A, if the # of receives exceeds lowwater(4), an ack
> will be sent back. I assume ACK is not trigered at the moment.
> when A is trying to receive a message from B, and the message never
> shows, A acctualy sends a heart beat back to B, however, it takes
> serveral seconds for this heart-beat to complete with error ( we
> configure timout ~1 sec, and retry count 7).
>
> Serveral seconds to detect connection failure is not acceptable for us,
> so if I use rdmacm, I want to know if I detect the connection
> failure faster than heart-beat message.
I think there is an internal contradiction in what you're doing here.
If your (ACK timeout) * (retry count) exceeds the time that you
consider acceptable to detect a failure, then you've set your
connection up wrong. It's not even meaningful to talk about a
connection failing faster than this amount of time -- a connection
will recover from a transient network failure that resolves itself
before the last retry fails, and without a time machine it's
impossible to say whether a network failure will or will not be
resolved 7 seconds into the future.
Certainly if you receive a disconnect request, then you know the
remote side is really and truly gone. But if you've set your
timeouts/retry counts so that connections will take 7 seconds to
fail after an event like a link going down, then there's no way to
detect that failure before it occurs.
It seems to me the solution is to reduce your timeout and/or retry
count so that connections fail within the time scale that you require.
- R.
More information about the general
mailing list