[openib-general] Re: [PATCH 0/4] SA path record caching

Sean Hefty mshefty at ichips.intel.com
Tue Jan 31 09:45:45 PST 2006


Michael S. Tsirkin wrote:
> If an RC QP timeouts, there's no arp reply, or a CM attempt
> does not get a reply, this is a strong hint that the path
> has a problem. I guess we could verify with sending getportinfo
> or something. Once there's no reply, I think we want to look
> fro another path, without the cache getting in the way.

If the cache contains multiple/all paths from a local SGID to some DGID, then it 
can be used to obtain another path.  If only an error has occurred, then the SA 
shouldn't have any paths that aren't known to the local cache.

> Lets just invalidate the entry that has the problem then.
> And I guess we could start SA query and delay the invalidation
> until we get the response.

Note that the cache submits a single query for all path records, rather than 
updating only a single entry.  The assumption here is that a failure on one path 
  may result in failures on other paths as well.

> I also wander what happens if the SA goes down at the exact 15 min
> interval when the cache is invalidated?

The SA query will timeout, and the old data will be used.  A new update attempt 
will be re-scheduled.

>>Do we expect most path failures to be permanent 
>>or transient?
> 
> No idea. Both kinds? What do you think?

I have no idea either.

- Sean



More information about the general mailing list