[openib-general] Local SA caching - why we need it

Or Gerlitz ogerlitz at voltaire.com
Mon Dec 4 02:55:23 PST 2006


Woodruff, Robert J wrote:
> This really is not an issue with the Intel MPI connection establishment
> design, rather, any application (or set of applications) that needs to
> establish lots of connections will have the same issue. 

This is over simplification: you were mentioning that in your testing 
the SA scaled to 15K queries/second, lets set that limit to 10K.

**if** your (anyone's) MPI makes sure it does not impose on the SA a 
load of more then 10K queries per seconds, it would take 100 seconds for 
the SA to provide 1M paths to a 1K process job with 1K paths/for/each rank.

This somehow simple design change reduces your job start time from 
today's infinite to 100 seconds plus the time it takes to do all the 
other work (IP2GID resolution && QP create/modify-init-rtr-rts &&
CM exchanges && your-mpi-etcs).

The local-sa shrinks the paths-fetching-time from 100 seconds to zero 
and your startup code time would reduce to the "other" time.

So now you are either very happy, or just hit the next roadblock since 
the "other" time is not negligible.

In the devcon and elsewhere on this thread i was trying to say this and 
mention some ideas re the next roadblock, not sure why you did not want 
to hear it. ==================================================

When a closed source product sets requirements on open source software 
they should be willing to discuss some/of/the actual design and 
implementation of their SW. Specifically, when competing SW products 
(specifically open source ones) are claimed to need the exact or similar 
set of functionalities, you should be willing to have a discussion. They 
might even get some good advice for free...

The local SA was not developed for Intel MPI needs, but rather in the 
framework of the path-forward project, for future open MPI usage and/or 
other requirements of the labs (routing algorithms/visualization etc).

With this at hand, and your instant request to include it in OFED 1.2
the group here thinks that a local/distributed SA can be quite good 
solution for the roadblock you are hitting and does not disagree to 
include it in OFED 1.2 in the non disturbing form you were mentioning.

But, having what seems to be a trend of more MPIs which are now in or 
soon to be in a transition towards moving to use the RDMA CM for their 
job start, I would ask to hold off with a kernel push, to first have the 
local sa solution tested and more over see what problems are seen by 
other (and yours) MPIs when attempting to scale over Infiniband.

Or.





More information about the general mailing list