[openib-general] SA cache design

Thu Jan 12 14:27:47 PST 2006

> From: Sean Hefty [mailto:mshefty at ichips.intel.com]
> Rimmer, Todd wrote:
> > Each MPI process is independent.  However they all need to 
> get pathrecords
> > for all the other processes/nodes in the system.
> > Hence, each process on a node will make the exact same set 
> of queries.
> 
> That should still only be P queries per node, with P = number 
> of processes on a 
> node.  Why doesn't a single query (GET_TABLE) suffice for 
> each process?

Given a cluster with 1000 nodes, 4 processors per node.
A given MPI run may choose to use a subset, for example 500 processes.
Each process needs path records for the other 500 processes, but not
for the other 3500 cpus in the cluster.

While each process could do a GET_TABLE for all path records that
would be rather inefficient and would provide 1,000,000 path records in
the RMPP response, of which only 500 are of interest.

Even if all 4000 processors were being used in a single run, each
process only needs 3999 path records (999 or which are unique).
In fact a given node will never need more than N or the N^2 path records
because the remaining involve paths where this node is not involved.
so getting all 1,000,000 path records would be very inefficient.

Then multiply this by 4 processes per node making this same set of
queries.  Then multiply this by multiple partitions, SLs, etc per node
and it gets very inefficient to simply get the whole table.

Todd Rimmer