[ofa-general] [RFC] the never ending search for SA scalability
Or Gerlitz
ogerlitz at voltaire.com
Tue Aug 14 01:27:38 PDT 2007
Roland Dreier wrote:
>> My initial hope is that we can get away with replicating the data, and
>> using the existing SA protocol to keep it in sync. For the example
>> you gave, SA LID X could forward the set to the master SA before
>> recording it locally. SA LID Y would check with the master SA for the
>> data if it weren't found local.
> In any case, I don't see what having a mechanism for manual SA
> redirect buys you. I'm probably missing the point, but obviously
> you're not planning on having sysadmins manually set the SA LID on
> each node, which implies having some automatic agent that can set the
> SA LID. And given the existence of that agent, I don't see the
> difficulty in putting that agent in the real SA and using the existing
> SA redirect protocol to handle setting the SA LID.
Sean,
Knowing the long way you have passed in order to solve the
order-n-squared-load-on-the-SA-with-all-to-all-path-query-on-mpi-job-startup
I really liked your idea which allows for implementing what you have
called the "local sa" as user space service which --if-- installed and
running allows for offloading the SA in the mentioned scenario.
From many aspects which were discussed over the previous threads, it
--really-- makes a difference if the "local sa" resides in the kernel or
in user space, and my take is put it in user space.
Roland,
What this mechanism buys Sean is solving the PR problem.
Indeed sysadmins would have to install/enable the user space rpm that
does the job, if they want this feature. Other then that, I don't see
any manual work: a possible design I see here, is that the "user space
SA" would use the SM LID for each port such that it does PR offload as
the actual SA LID to replicate/invalidate/etc PR info and "proxy" non PR
queries to.
Indeed, end-in-mind, this package may or may not be the basis for real
distributed SA for those many K ports IB clusters. However, assuming its
possible to implement the concept with only a little change in the
kernel ib_sa (ie conditioned on a module param etc), I would not block
this approach at this stage, and let Sean suggest a concrete design.
Or.
More information about the general
mailing list