[openib-general] Local SA caching - why we need it

Michael S. Tsirkin mst at mellanox.co.il
Mon Dec 4 04:43:04 PST 2006


> A well designed SA cache/replica can use the assorted InformInfo notices
> from the SM to detect when GIDs come and go and hence properly update
> the relevant subset of its replica.

OK, but its not clear that keeping local sa cache per node is the answer.
Specifically it seems to be good for some workloads/topologies, but
worst case numbers do not look good to me:

It seems that (especially at startup time), the number of notifications
sent would be linear with the network size N.
Since all N nodes need to be notified, we get O(N^2) notifications instead of
O(N^2) queries, which does not seem to be a win - especially if you consider
that queries can be on demand while notifications aren't.

For example, a design using SA redirection with O(\sqrt N) SA replicas would give you
O(\sqrt N) notifications per replica and this way we would get O(N \sqrt N)
notifications and O(N \sqrt N) queries.
This would also move the code out from kernel to userspace SA.

I also note that the local_sa design from here:
https://openib.org/svn/gen2/trunk/src/linux-kernel/infiniband/core/local_sa.c
does not seem to implement any InformInfo notices, which seems to guarantee
query storms with low cache timeout values, and connection timeouts on
topology changes with high cache timeout values.

-- 
MST




More information about the general mailing list