[openib-general] SA cache design

Eitan Zahavi eitan at mellanox.co.il
Mon Jan 16 13:49:19 PST 2006


Hi Sean

> Eitan Zahavi wrote:
> > [EZ] The scalability issues we see today are what I most worry
about.
> 
> 
> One issue that I see is that the CMA, IB CM, and DAPL APIs support
only
> point-to-point connections.  Trying to layer a many-to-many connection
model
> over these is leading to the inefficiencies.  For example, the CMA
generates one
> SA query per connection.  Another issue is that even if if the number
of queries
> were reduced, the fabric will still see O(n^2) connection messages.
[EZ] Having N^2 messages is not a big problem if they do not all go one
target... 
CM is distributed and this is good. Only the PathRecord section of the
connection establishment is going today to one node (SA) and you are
about to fix it...
During initial connections setup you will not have anything in the SA
cache and thus the SA will need to answer N^2 PathRecords. Smart
exponential back-off can resolve that DOS attack on the SA at bring-up.
> 
> Based on the code, the only SA query of interest to most users will be
a path
> record query by gids/pkey.  To speed up applications written to the
current CMA,
> DAPL, and Intel's MPI (hey, I gotta eat), my actual implementation has
a very
> limited path record cache in the kernel.  The cache uses an index with
O(1)
> insertion, removal, and retrieval.  (I plan on re-using the index to
help
> improve the performance of the IB CM as well.)
[EZ] We might need a little more in the key for QoS support (to come).
> 
> I'm still working on ideas to address the many-to-many connection
model.  One
[EZ] I would try and make sure the connections are not done in a manner
such that all nodes try to establish connections to a single node at the
same time. This is an application issue but can be easily resolve. Do
the MPI connection in a loop like:

for (target  = (myRank + 1) % numNodes ; target != myRank; (target++) %
numNodes) {
	/* establish connection to node target */
}

> idea is to have a centralized connection manager to coordinate the
connections
> between the various endpoints.  The drawback is that this requires
defining a
> proprietary protocol.  Any implementation work in this area will be
deferred for
> now though.
[EZ] I think a centralized CM is a only going to make things worse.
> 
> - Sean



More information about the general mailing list