[openib-general] Re: [PATCH 0/4] SA path record caching
Michael S. Tsirkin
mst at mellanox.co.il
Mon Jan 30 22:38:34 PST 2006
Quoting r. Sean Hefty <mshefty at ichips.intel.com>:
> Subject: Re: [PATCH 0/4] SA path record caching
>
> Michael S. Tsirkin wrote:
> >>The cache is updated every 15 minutes, but this value is user configurable at
> >>module load time. The cache is also updated if a local event occurs, such as
> >>port up/down or SM LID change.
> >
> > Unfortunately port down on any link in the path has the same effect
> > but wont invalidate the cache.
>
> Yes, but it's was more difficult for me to detect remote events. At some point,
> the cache needs to register with the SA for events, but that only works as long
> as the SA is reachable.
I see how this would be nontrivial. Further, registration with SA events
means that the SA has to initiate a storm of notification MADs
on each link failure. See below for some ideas on how to improve on this.
If an RC QP timeouts, there's no arp reply, or a CM attempt
does not get a reply, this is a strong hint that the path
has a problem. I guess we could verify with sending getportinfo
or something. Once there's no reply, I think we want to look
fro another path, without the cache getting in the way.
> > One way to solve this would be to invalidate the cache, and retry,
> > if an attempt to connect to the remote node fails.
>
> I didn't want to invalidate the cache too quickly.
> If the SA goes down, or the
> link to it drops, then the cache can still be used to establish connections with
> those nodes that are reachable.
Lets just invalidate the entry that has the problem then.
And I guess we could start SA query and delay the invalidation
until we get the response.
I also wander what happens if the SA goes down at the exact 15 min
interval when the cache is invalidated?
> Do we expect most path failures to be permanent
> or transient?
No idea. Both kinds? What do you think?
> > SA gets trap notices on link failures, doesnt it?
> > So, unlike with the local cache, we dont depend on sweeps.
>
> Traps are optional.
RC is also optional :) Implementations seem to have them.
> >> At some point,
> >>the cache will contain multiple paths, letting the connection be retried along
> >>another path.
> >
> > Is this possible in the current implementation?
>
> Currently, only a single path record is maintained to each remote node.
> Supporting multiple paths would be possible with some additional work. (MPI
> requires multiple paths for its routing algorithms.)
--
Michael S. Tsirkin
Staff Engineer, Mellanox Technologies
More information about the general
mailing list