[ofa-general] Further 2.6.23 merge plans...

Wed Jul 18 14:32:43 PDT 2007

On Wed, Jul 18, 2007 at 01:52:25PM -0700, Sean Hefty wrote:

> 1. Resolves IP addresses to DGIDs using ARP.  This results in IPoIB 
> querying the SA and caching 1 PR per DGID.
> 2. Apps query the SA for PRs, with 1 PR query per DGID.  Eventually 
> we'll get back the same set of PRs that IPoIB already had cached.
> 3. Establish the connections.  The IB CM stores the PR information with 
> each connection in order to set the QP attributes properly.

So, since you flush the cache for your MPI jobs the gain you see is
basically by re-using the data collected by ipoib?

If this is the case, do you get the same first-order benifit by
essentially using the ipoib cache for all PR queries?

> >I personally think a simple in-kernel (small) fast lookup cache merged
> >with the ipoib cache that has a netlink interface to userspace to
> >add/delete/flush entries is a very good solution that will keep being
> >useful in future. netlink would also carry cache miss queries to
> >userspace. In absense of a daemon the kernel could query on its own
> >but cache very conservatively. A userspace version of the very
> >agressive cache you have now could also be created right away.
> 
> I believe that the PR caching should be done outside of IPoIB.  Other 
> paths may exist that IPoIB does not use.

When I said merged, I was thinking eliminating the ipoib cache
component and using your new module. Doesn't seem much sense in
caching twice, especially since ipoib already lacks anything to keep
the cache coherent with the SA - and that is what the main work
is. One PR record cache in the kernel, and it would be in roughly the
same architectual spot as the your local sa module.

> >This is because I firmly do not belive in caching as a solution to the
> >scalability problems. It must be solved with some level of replication
> >and distribution of the SA data and algorithms.
> 
> PR caching *is* replication of the SA data.  The local SA works with all 
> existing SAs.  It is not tied to one vendor, nor does it require changes 
>  to the SAs.  Sure, we can define vendor specific protocols to assist 
> with/optimize synchronization, but I don't believe it is necessary in an 
> initial submission.  (In fact I think it's undesirable at this point, 
> since it would require changes to the SA.)

Ok, so I draw a distiction between caching _some_ final end products (ie
PRs) without any coherency to the original source data, and coherently
replicating enough data to compute _any_ query on demand.

Some vs All and Pull vs Push.

I'm trying to say, I think a simple kernel cache itself is fine, but
there should be only 1 cache (get rid of ipoib) and it should have a
really good interface to userspace so that the really hard problems
can be solved through user space code.

Not suggesting cache to SA vendor specific hooks or anything like
that, just a well defined kernel module that lets user space
co-opt path resolution, which needs to include a kernel cache
component due to ipoib.

Jason