[ofa-general] Re: [RFC] [PATCH 0/3] 2.6.22 or 23 ib: add path record cache

Jason Gunthorpe jgunthorpe at obsidianresearch.com
Mon Apr 23 14:25:10 PDT 2007


On Mon, Apr 23, 2007 at 11:20:59PM +0300, Michael S. Tsirkin wrote:

> I haven't thought this through yet. Basically, I just note that
> caching the path until GID goes out of service isn't right - since
> path parameters such as MTU or rate might change without GID going
> out of service.
> 
> So what to do?

Has anyone thought about using replication rather than caching to
solve this problem? It seems to me it would be alot faster for some
single process in the network to fetch and keep a copy of the entire
SA route database, format it into a binary format and use RC RDMA to
transfer it to every node each time it changes.

For say, 10000 nodes you could compact an any-to-any path table into
around 20 megabytes.

The RDMA transfers would be arranged into a waterfall, source
transfers to 8 nodes, who then each transfer to 8, etc. Choosing a
connection topology that overlays the switch topology would give this
scheme a huge aggregate bandwidth so the total transfer time would be
short.

Unfortunately the SA protocol doesn't seem to have many provisions for
cache-coherence so it seems any form of route caching is going to run
into problems with stale data :< Replication adds a coherenece
mechanism and shifts the problem the replication source, which,
ideally, would ultimately be tightly connected to the SA.

> We could use DR SMPs to do network discovery and at least check that
> paths are valid - it's not too much code (ibnetdiscover is just 800
> lines) and in a sense, that's actually putting an *SA* (not just
> cache) in each node.  Combined with GID IN/OUT notices we could get
> away from querying path records completely.

I don't think you can find/check the SL like this, plus I doubt the
little CPUs in the switches can handle that rate of SMPs. :<

Jason



More information about the general mailing list