[openib-general] Local SA caching - why we need it
Or Gerlitz
ogerlitz at voltaire.com
Mon Dec 4 02:55:23 PST 2006
Woodruff, Robert J wrote:
> This really is not an issue with the Intel MPI connection establishment
> design, rather, any application (or set of applications) that needs to
> establish lots of connections will have the same issue.
This is over simplification: you were mentioning that in your testing
the SA scaled to 15K queries/second, lets set that limit to 10K.
**if** your (anyone's) MPI makes sure it does not impose on the SA a
load of more then 10K queries per seconds, it would take 100 seconds for
the SA to provide 1M paths to a 1K process job with 1K paths/for/each rank.
This somehow simple design change reduces your job start time from
today's infinite to 100 seconds plus the time it takes to do all the
other work (IP2GID resolution && QP create/modify-init-rtr-rts &&
CM exchanges && your-mpi-etcs).
The local-sa shrinks the paths-fetching-time from 100 seconds to zero
and your startup code time would reduce to the "other" time.
So now you are either very happy, or just hit the next roadblock since
the "other" time is not negligible.
In the devcon and elsewhere on this thread i was trying to say this and
mention some ideas re the next roadblock, not sure why you did not want
to hear it. ==================================================
When a closed source product sets requirements on open source software
they should be willing to discuss some/of/the actual design and
implementation of their SW. Specifically, when competing SW products
(specifically open source ones) are claimed to need the exact or similar
set of functionalities, you should be willing to have a discussion. They
might even get some good advice for free...
The local SA was not developed for Intel MPI needs, but rather in the
framework of the path-forward project, for future open MPI usage and/or
other requirements of the labs (routing algorithms/visualization etc).
With this at hand, and your instant request to include it in OFED 1.2
the group here thinks that a local/distributed SA can be quite good
solution for the roadblock you are hitting and does not disagree to
include it in OFED 1.2 in the non disturbing form you were mentioning.
But, having what seems to be a trend of more MPIs which are now in or
soon to be in a transition towards moving to use the RDMA CM for their
job start, I would ask to hold off with a kernel push, to first have the
local sa solution tested and more over see what problems are seen by
other (and yours) MPIs when attempting to scale over Infiniband.
Or.
More information about the general
mailing list