[ofa-general] Re: IPoIB path caching

Jason Gunthorpe jgunthorpe at obsidianresearch.com
Mon Jul 23 11:41:05 PDT 2007


On Mon, Jul 23, 2007 at 11:10:08AM -0700, Sean Hefty wrote:

> >The design I was thinking to suggest for IPoIB is to almost always use 
> >this API since this policy makes the implementation consistent with the 
> >decisions made by the network stack neighbour cache
> 
> This defeats one of the benefit of caching, which is using a single 
> GetTable query, versus literally hundreds or thousands of Get queries. 
> Consider that constant all-to-all communication using IPoIB between 1024 
> ports, with a 15 minute ARP table timeout would hit the SA with close to 
> 600 queries per second.
> 
> I agree with Michael that it would be better for a ULP to invalidate 
> cache entries.

Well, in my view, this is exactly the sort of thing you should not
do. ULPs have no better idea what is going on. ARP expiry doesn't give
you any special information about the cached PR.

If kernel caching is used it must be viewed as authorative and kept
current through some kind of external mechanism. Something like
re-doing the big GetTable query prior to starting a job is a fine
interm way to do this. Ideally updating the kernel sa cache would also
push updated data into the neighbor AH structures as appropriate. Then
there is one single source of PR data, one source of IP -> GID
mapppings, etc.

These problems are just an unavoidable part of trying to use caching -
build the mechanism to support coherent replication and just deal with
these downsides <shrug>. Sean is basically doing non-coherent
replication today with his big GetTable query and that sounds like
what is speeding things up, not caching indivudal PRs.

> overstating the likelihood of issues occurring in practice.  Even
> the SA

Well, any time your renumber your network you will get burned and have
to restart ipoib on every node with the way things are
today. Something like a SM upgrade or changing to a new vendor SM, or
increasing LMC could do this to you.

> caches the path data -- getting a PR from the SA doesn't provide any
> additional guarantees.

Erm, any SA that returns a PR that is invalid in the network outside
the time the network is being updated is seriously busted, IMHO.

Jason



More information about the general mailing list