[ofa-general] Further 2.6.23 merge plans...

Wed Jul 18 13:52:25 PDT 2007

> So, my main concern is with the role of kernel caching and especially with
> how control is exported to user space.

The only control currently exported by the local SA is a module 
parameter that allows a user to force a refresh of the entire cache.  I 
do not want to extend this until we can get at least some basic PR 
caching functionality merged.

I want something small that we can build on, and the local_sa patch is 
already 1300 lines of code, with another 1000 lines of code to support 
informinfo registration.

> Clearly the kernel needs a fast lookup cache for things like ipoib and
> others. I don't think a kernel module needs or wants a full on
> distributed SA.

We talking about PR caching only at this point, with possible extensions 
to support QoS.  Other SA information is not cached or needed.

For all to all connections, current code does something like the following:

1. Resolves IP addresses to DGIDs using ARP.  This results in IPoIB 
querying the SA and caching 1 PR per DGID.
2. Apps query the SA for PRs, with 1 PR query per DGID.  Eventually 
we'll get back the same set of PRs that IPoIB already had cached.
3. Establish the connections.  The IB CM stores the PR information with 
each connection in order to set the QP attributes properly.

We end up with redundant queries and the PR being cached in multiple 
places.  One optimization is to replace the N PR queries with a single, 
more efficient GetTable query.  A second optimization is to centralize 
the PR caching.  The local SA does the first, and starts us down the 
road of the second.

> I personally think a simple in-kernel (small) fast lookup cache merged
> with the ipoib cache that has a netlink interface to userspace to
> add/delete/flush entries is a very good solution that will keep being
> useful in future. netlink would also carry cache miss queries to
> userspace. In absense of a daemon the kernel could query on its own
> but cache very conservatively. A userspace version of the very
> agressive cache you have now could also be created right away.

I believe that the PR caching should be done outside of IPoIB.  Other 
paths may exist that IPoIB does not use.

> This is because I firmly do not belive in caching as a solution to the
> scalability problems. It must be solved with some level of replication
> and distribution of the SA data and algorithms.

PR caching *is* replication of the SA data.  The local SA works with all 
existing SAs.  It is not tied to one vendor, nor does it require changes 
  to the SAs.  Sure, we can define vendor specific protocols to assist 
with/optimize synchronization, but I don't believe it is necessary in an 
initial submission.  (In fact I think it's undesirable at this point, 
since it would require changes to the SA.)

> Maybe you could summarise how the user/kernel interface works?  The
> last I saw was something based on MADs that looked very inefficient
> compared with netlink.

I suggested a MAD interface to the local SA as being the most 
extensible.  It allows interacting with the cache from a local or remote 
node in a very IB fashion.  The local SA is located over QP1, and any 
new protocols can re-use the existing SA MAD format.

For example, the cache could be loaded using a 'SetTable PR' MAD.  It 
doesn't matter if the MAD is sent from a local user space daemon, some 
distributed SA agent, or the master SA.  Paths can be invalidated by 
sending 'Delete PR' MADs.

It may also be possible to extend such an interface for QoS purposes.

- Sean