[openib-general] Re: [PATCH 0/4] SA path record caching

Michael S. Tsirkin mst at mellanox.co.il
Mon Jan 30 22:38:34 PST 2006


Quoting r. Sean Hefty <mshefty at ichips.intel.com>:
> Subject: Re: [PATCH 0/4] SA path record caching
> 
> Michael S. Tsirkin wrote:
> >>The cache is updated every 15 minutes, but this value is user configurable at 
> >>module load time.  The cache is also updated if a local event occurs, such as 
> >>port up/down or SM LID change.
> > 
> > Unfortunately port down on any link in the path has the same effect
> > but wont invalidate the cache.
> 
> Yes, but it's was more difficult for me to detect remote events.  At some point, 
> the cache needs to register with the SA for events, but that only works as long 
> as the SA is reachable.

I see how this would be nontrivial. Further, registration with SA events
means that the SA has to initiate a storm of notification MADs
on each link failure. See below for some ideas on how to improve on this.

If an RC QP timeouts, there's no arp reply, or a CM attempt
does not get a reply, this is a strong hint that the path
has a problem. I guess we could verify with sending getportinfo
or something. Once there's no reply, I think we want to look
fro another path, without the cache getting in the way.

> > One way to solve this would be to invalidate the cache, and retry,
> > if an attempt to connect to the remote node fails.
> 
> I didn't want to invalidate the cache too quickly.
> If the SA goes down, or the 
> link to it drops, then the cache can still be used to establish connections with 
> those nodes that are reachable.

Lets just invalidate the entry that has the problem then.
And I guess we could start SA query and delay the invalidation
until we get the response.

I also wander what happens if the SA goes down at the exact 15 min
interval when the cache is invalidated?

> Do we expect most path failures to be permanent 
> or transient?

No idea. Both kinds? What do you think?

> > SA gets trap notices on link failures, doesnt it?
> > So, unlike with the local cache, we dont depend on sweeps.
> 
> Traps are optional.

RC is also optional :) Implementations seem to have them.

> >> At some point, 
> >>the cache will contain multiple paths, letting the connection be retried along 
> >>another path.
> > 
> > Is this possible in the current implementation?
> 
> Currently, only a single path record is maintained to each remote node. 
> Supporting multiple paths would be possible with some additional work.  (MPI 
> requires multiple paths for its routing algorithms.)


-- 
Michael S. Tsirkin
Staff Engineer, Mellanox Technologies



More information about the general mailing list