[ofa-general] Re: [RFC] [PATCH 0/3] 2.6.22 or 23 ib: add path record cache

Michael S. Tsirkin mst at dev.mellanox.co.il
Sun Apr 22 03:37:04 PDT 2007


> Using a local SA cache we were able to establish all-to-all connections between
> 1024 processes (about 1 million connections) in about 3 seconds.  Without the
> cache, connection time took about a minute, and required a substantial amount of
> tuning of timeout values to achieve this.

Sean, I think there are some good ideas in this patch:
- limiting the number of outstanding SA MADs
- batching multiple path queries in a single table request
I think these are likely to help many workloads, not just MPI all-to-all.


I am very time-constrained currently due to the work on OFED 1.2
so I am responding to design more than the code itself.

Questions:

1. What happens on e.g. a heterogenious network? It seems that
path to a specific GID might change e.g. MTU without GID
going in/out of service. How would this be handled?

2. What will happen on a number of changes in the network?
Would not the SA would need to send a huge number of notices now?
Should we be concerned?

3. Comments indicate that the main win from the patch is
with all-to-all startup times on large MPI clusters. If that is so,
and assuming a small number of MPI jobs is running on each node,
isn't it true that the main win is not from *caching* as such
(since all paths are requested at the beginning and never
used after this), but rather from limiting the number of outstanding MADs to SA
and from reusing multiple path queries in a single request.
Could that be the case?

4. Why do we need yet another API and yet another module to speed up just
RDMA/CM path record queries?  We now get 2 ways to do this (with/without the
cache).  Shouldn't there be just one?

5. How will the user guess the correct value for paths_per_dest tunable,
besides disabling the cache? I notice it is currently set to a value
of 0x7F. Where does this value come from?


> I've only updated the rdma_cm to use the cache, but similar changes could be
> made to SRP and ipoib (which implements its own path record cache).
> 
> I would like to get feedback on both the notice and local_sa patches for
> inclusion in 2.6.22 or 2.6.23 (if 2.6.22 is not possible).

Since OFED includes a significantly different version of this code
(without notices), and this is the first time the notices code
makes an appearance, I think that targeting .23, and considering
alternative options such as the above, would be more prudent.

-- 
MST



More information about the general mailing list