[openib-general] [PATCH for-2.6.18] Re: [PATCH] IB/cma: add rdma_establish

Sean Hefty mshefty at ichips.intel.com
Wed Sep 13 12:40:38 PDT 2006


Michael S. Tsirkin wrote:
> I don't really understand. The fix is a one-liner.
> The problem is observed in practice, under stress.
> Who *wants* systems that fall apart under stress?

My view is: is this worth delaying the release of the kernel?  And I don't see 
that it is at this point in the 2.6.18 release cycle.  This does not fix a 
system crash.  It only allows a connection to be made if the system is under 
heavy stress.

> It seems that with retry of 3, chances of losing
> one out of 3 packets would be close to 100% if loss rate is about 10%.
> Ranking it up to 15, you need loss rate on top of 50% to get close to 100%
> chance of losing connection request.

I'm not quite following the math here.

> Losing a DREP is also bad - as it leaves stale connections around
> munching up resources.

Yes - but retrying the DREQ doesn't end up fixing the issue.  The side that 
sends the DREP often ends of entering and exiting timewait before the DREQ can 
be retried.  This results in the DREQ being lost.  Eventually the DREQ will time 
out, and the connection will be torn down.  Increasing the number of times that 
the DREQ is retried ends up increasing how long the connection stays around.

- Sean




More information about the general mailing list