[openib-general] Re: [PATCH 0/4] SA path record caching
Sean Hefty
mshefty at ichips.intel.com
Tue Jan 31 09:45:45 PST 2006
Michael S. Tsirkin wrote:
> If an RC QP timeouts, there's no arp reply, or a CM attempt
> does not get a reply, this is a strong hint that the path
> has a problem. I guess we could verify with sending getportinfo
> or something. Once there's no reply, I think we want to look
> fro another path, without the cache getting in the way.
If the cache contains multiple/all paths from a local SGID to some DGID, then it
can be used to obtain another path. If only an error has occurred, then the SA
shouldn't have any paths that aren't known to the local cache.
> Lets just invalidate the entry that has the problem then.
> And I guess we could start SA query and delay the invalidation
> until we get the response.
Note that the cache submits a single query for all path records, rather than
updating only a single entry. The assumption here is that a failure on one path
may result in failures on other paths as well.
> I also wander what happens if the SA goes down at the exact 15 min
> interval when the cache is invalidated?
The SA query will timeout, and the old data will be used. A new update attempt
will be re-scheduled.
>>Do we expect most path failures to be permanent
>>or transient?
>
> No idea. Both kinds? What do you think?
I have no idea either.
- Sean
More information about the general
mailing list