[ofa-general] Re: [RFC] [PATCH 0/3] 2.6.22 or 23 ib: add path record cache

Michael S. Tsirkin mst at dev.mellanox.co.il
Mon Apr 23 15:30:05 PDT 2007


> Has anyone thought about using replication rather than caching to
> solve this problem? It seems to me it would be alot faster for some
> single process in the network to fetch and keep a copy of the entire
> SA route database, format it into a binary format and use RC RDMA to
> transfer it to every node each time it changes.
>
> For say, 10000 nodes you could compact an any-to-any path table into
> around 20 megabytes.

I wonder how do you propose to compact the path records that drastically?
Number of records seems to be  10000 * 10000 / 2 = 50 million.

> The RDMA transfers would be arranged into a waterfall, source
> transfers to 8 nodes, who then each transfer to 8, etc. Choosing a
> connection topology that overlays the switch topology would give this
> scheme a huge aggregate bandwidth so the total transfer time would be
> short.
> 
> Unfortunately the SA protocol doesn't seem to have many provisions for
> cache-coherence so it seems any form of route caching is going to run
> into problems with stale data :< Replication adds a coherenece
> mechanism and shifts the problem the replication source, which,
> ideally, would ultimately be tightly connected to the SA.

Yes, I do agree that replication solves some problems with caching,
at least in theory, and I'd like to see this area explored, too.

Things to cover before we can start implementation would be: the protocol
between the replicas, how is the waterfall setup, how to handle errors/replicas
going out of service.

> > We could use DR SMPs to do network discovery and at least check that
> > paths are valid - it's not too much code (ibnetdiscover is just 800
> > lines) and in a sense, that's actually putting an *SA* (not just
> > cache) in each node.  Combined with GID IN/OUT notices we could get
> > away from querying path records completely.
> 
> I don't think you can find/check the SL like this,

Well, it seems configuring the SL needs to be done in
some config file in the SM anyway. So now a script
will have to be run to copy that file across all nodes.
Might not be a big deal.

-- 
MST



More information about the general mailing list