[ofa-general] [RFC] the never ending search for SA scalability

Or Gerlitz ogerlitz at voltaire.com
Tue Aug 14 01:27:38 PDT 2007


Roland Dreier wrote:
>> My initial hope is that we can get away with replicating the data, and
>> using the existing SA protocol to keep it in sync.  For the example
>> you gave, SA LID X could forward the set to the master SA before
>> recording it locally.  SA LID Y would check with the master SA for the
>> data if it weren't found local.

> In any case, I don't see what having a mechanism for manual SA
> redirect buys you.  I'm probably missing the point, but obviously
> you're not planning on having sysadmins manually set the SA LID on
> each node, which implies having some automatic agent that can set the
> SA LID.  And given the existence of that agent, I don't see the
> difficulty in putting that agent in the real SA and using the existing
> SA redirect protocol to handle setting the SA LID.

Sean,

Knowing the long way you have passed in order to solve the 
order-n-squared-load-on-the-SA-with-all-to-all-path-query-on-mpi-job-startup
I really liked your idea which allows for implementing what you have 
called the "local sa" as user space service which --if-- installed and 
running allows for offloading the SA in the mentioned scenario.

 From many aspects which were discussed over the previous threads, it 
--really-- makes a difference if the "local sa" resides in the kernel or 
in user space, and my take is put it in user space.

Roland,

What this mechanism buys Sean is solving the PR problem.

Indeed sysadmins would have to install/enable the user space rpm that 
does the job, if they want this feature. Other then that, I don't see 
any manual work: a possible design I see here, is that the "user space 
SA" would use the SM LID for each port such that it does PR offload as 
the actual SA LID to replicate/invalidate/etc PR info and "proxy" non PR 
queries to.

Indeed, end-in-mind, this package may or may not be the basis for real 
distributed SA for those many K ports IB clusters. However, assuming its 
possible to implement the concept with only a little change in the 
kernel ib_sa (ie conditioned on a module param etc), I would not block 
this approach at this stage, and let Sean suggest a concrete design.

Or.




More information about the general mailing list