[openib-general] [PATCH] IB_CM: Limit the MRA timeout

Michael S. Tsirkin mst at mellanox.co.il
Wed Oct 4 13:37:29 PDT 2006


Quoting r. Sean Hefty <mshefty at ichips.intel.com>:
> Subject: Re: [PATCH] IB_CM: Limit the MRA timeout
> 
> Michael S. Tsirkin wrote:
> >>There's several timeout values transfered and used by the cm, most notably the 
> >>remote cm response timeout and packet life time.  Does it make more sense to 
> >>have a single, generic timeout maximum instead?
> > 
> > Hmm. I'm not sure - we are working around an actual broken implementation here -
> > what do you think?
> 
> I wasn't sure either.  The MRA timeout is a combination of the packet life time 
> + service timeout, which made me bring this up.  The patch only handles the 
> service timeout portion, so we end up in the same situation if a large packet 
> life time is ever used.

But that comes from the SA, does it not?

> >>Would it make more sense to 
> >>enable the maximum(s) by default, since we're dependent upon values received 
> >>over the network?
> > 
> > I think it would.
> 
> So do I.
> 
> The CM has checks to bring out of range values into range, but at the maximum, 
> we get a timeout of about 2.5 hours.  Multiple that by 15 retries, and the cm 
> can literally spend all day retrying a request.
> 
> I was considering dropping the default maximum down to around 4-8 seconds, which 
> with retries still gives us about a minute to timeout a request.  The default 
> maximum would apply to local and remote cm timeouts, packet life time, and 
> service timeout, but could be overridden by the user.  (Basically, with Ishai's 
> patch: rename mra_timeout_limit to timeout_limit, set to a default of 20, and 
> replace occurrences of '31' in the code with timeout_limit.)

For remote cm timeout and service timeout this makes sense - they seem
currently mostly taken out of the blue on implementations I've seen.

But since the packet lifetime comes from the SM, it actually has a chance
to reflect some knowledge about the network topology.
And since we haven't see any practical issues with packet life time yet -
maybe a different paremeter for that, with a higher limit?

-- 
MST




More information about the general mailing list