[openib-general] [PATCH] IB_CM: Limit the MRA timeout
Michael S. Tsirkin
mst at mellanox.co.il
Wed Oct 4 13:37:29 PDT 2006
Quoting r. Sean Hefty <mshefty at ichips.intel.com>:
> Subject: Re: [PATCH] IB_CM: Limit the MRA timeout
>
> Michael S. Tsirkin wrote:
> >>There's several timeout values transfered and used by the cm, most notably the
> >>remote cm response timeout and packet life time. Does it make more sense to
> >>have a single, generic timeout maximum instead?
> >
> > Hmm. I'm not sure - we are working around an actual broken implementation here -
> > what do you think?
>
> I wasn't sure either. The MRA timeout is a combination of the packet life time
> + service timeout, which made me bring this up. The patch only handles the
> service timeout portion, so we end up in the same situation if a large packet
> life time is ever used.
But that comes from the SA, does it not?
> >>Would it make more sense to
> >>enable the maximum(s) by default, since we're dependent upon values received
> >>over the network?
> >
> > I think it would.
>
> So do I.
>
> The CM has checks to bring out of range values into range, but at the maximum,
> we get a timeout of about 2.5 hours. Multiple that by 15 retries, and the cm
> can literally spend all day retrying a request.
>
> I was considering dropping the default maximum down to around 4-8 seconds, which
> with retries still gives us about a minute to timeout a request. The default
> maximum would apply to local and remote cm timeouts, packet life time, and
> service timeout, but could be overridden by the user. (Basically, with Ishai's
> patch: rename mra_timeout_limit to timeout_limit, set to a default of 20, and
> replace occurrences of '31' in the code with timeout_limit.)
For remote cm timeout and service timeout this makes sense - they seem
currently mostly taken out of the blue on implementations I've seen.
But since the packet lifetime comes from the SM, it actually has a chance
to reflect some knowledge about the network topology.
And since we haven't see any practical issues with packet life time yet -
maybe a different paremeter for that, with a higher limit?
--
MST
More information about the general
mailing list