[openib-general] Re: [PATCH 0/4] SA path record caching

Sean Hefty mshefty at ichips.intel.com
Mon Jan 30 14:33:17 PST 2006


Michael S. Tsirkin wrote:
>>The cache is updated every 15 minutes, but this value is user configurable at 
>>module load time.  The cache is also updated if a local event occurs, such as 
>>port up/down or SM LID change.
> 
> Unfortunately port down on any link in the path has the same effect
> but wont invalidate the cache.

Yes, but it's was more difficult for me to detect remote events.  At some point, 
the cache needs to register with the SA for events, but that only works as long 
as the SA is reachable.

> One way to solve this would be to invalidate the cache, and retry,
> if an attempt to connect to the remote node fails.

I didn't want to invalidate the cache too quickly.  If the SA goes down, or the 
link to it drops, then the cache can still be used to establish connections with 
those nodes that are reachable.  Do we expect most path failures to be permanent 
or transient?

> SA gets trap notices on link failures, doesnt it?
> So, unlike with the local cache, we dont depend on sweeps.

Traps are optional.

>> At some point, 
>>the cache will contain multiple paths, letting the connection be retried along 
>>another path.
> 
> Is this possible in the current implementation?

Currently, only a single path record is maintained to each remote node. 
Supporting multiple paths would be possible with some additional work.  (MPI 
requires multiple paths for its routing algorithms.)

- Sean



More information about the general mailing list