[openib-general] [PATCH for-2.6.18] Re: [PATCH] IB/cma: add rdma_establish
Sean Hefty
mshefty at ichips.intel.com
Wed Sep 13 12:40:38 PDT 2006
Michael S. Tsirkin wrote:
> I don't really understand. The fix is a one-liner.
> The problem is observed in practice, under stress.
> Who *wants* systems that fall apart under stress?
My view is: is this worth delaying the release of the kernel? And I don't see
that it is at this point in the 2.6.18 release cycle. This does not fix a
system crash. It only allows a connection to be made if the system is under
heavy stress.
> It seems that with retry of 3, chances of losing
> one out of 3 packets would be close to 100% if loss rate is about 10%.
> Ranking it up to 15, you need loss rate on top of 50% to get close to 100%
> chance of losing connection request.
I'm not quite following the math here.
> Losing a DREP is also bad - as it leaves stale connections around
> munching up resources.
Yes - but retrying the DREQ doesn't end up fixing the issue. The side that
sends the DREP often ends of entering and exiting timewait before the DREQ can
be retried. This results in the DREQ being lost. Eventually the DREQ will time
out, and the connection will be torn down. Increasing the number of times that
the DREQ is retried ends up increasing how long the connection stays around.
- Sean
More information about the general
mailing list