[ofa-general] Re: Re: [PATCH RFC] sharing userspace IB objects
Gleb Natapov
glebn at voltaire.com
Tue Jun 26 05:25:39 PDT 2007
On Tue, Jun 26, 2007 at 02:44:02PM +0300, Michael S. Tsirkin wrote:
> > Quoting Gleb Natapov <glebn at voltaire.com>:
> > Subject: Re: Re: [PATCH RFC] sharing userspace IB objects
> >
> > On Tue, Jun 26, 2007 at 12:51:25PM +0300, Michael S. Tsirkin wrote:
> > > > Quoting Gleb Natapov <glebn at voltaire.com>:
> > > > Subject: Re: Re: [PATCH RFC] sharing userspace IB objects
> > > >
> > > > On Tue, Jun 26, 2007 at 10:06:41AM +0300, Michael S. Tsirkin wrote:
> > > > > > Is there really a strong use case for making every type of object
> > > > > > shareable? Can we handle the SRC stuff without going to this
> > > > > > extreme of complexity?
> > > > >
> > > > > This is not directly related to SRC: this is an effort
> > > > > to make it possible to share QPs, CQ etc across processes
> > > > > in the same way as they can be currently shared across threads.
> > > > > So assuming that we want multiple processes to post to
> > > > > the same QP, how can we support this?
> > > >
> > > > Are you absolutely sure you even want to support this?
> > >
> > > Take a look here :)
> > > http://www.quotedb.com/quotes/1007
> > So there is still a chance you'll reconsider :)
>
> Sure, if someone comes up with a better way to improve scalability
> for single-threaded applications.
What good is a solution that no one will use? No solution is better then
a bad one because this will motivate people to look for proper solution.
>
> > >
> > > > What is the user case?
> > >
> > > Use case? Scalability. Pls go over Dror's presentation given at Sonoma -
> > > he calls this SSQ.
> >
> > As far as I can tell he is talking about HW supported solution and not
> > half baked SW one.
>
> No, sharing a send queue must be done in software. I don't really see the reason
> for sarcasm: do you see value in sharing resources between multiple threads?
> Why not multiple processes? Some people just don't want to program
> in multithreaded environment.
Yes I see the value in sharing resources between threads and processes
if done right. This proposition is far from being right. There is not
sarcasm in my sentence either. You can't claim that what you propose is as seamless
as it should be.
I have no problem with sharing send queue. What I want to be able to do
is to attach CQ from each process to a shared QP. When send posted by
process A completes the completion is posted into A's CQ. HW should be
able to multiplex this IMO.
>
> > >
> > > > If multiple processes what to post to the same QP how will you
> > > > ensure that right process will receive right completion event?
> > >
> > > Same as with threads - memory for CQEs and locks will be allocated
> > > in shared memory to make it possible for multiple processes to poll
> > > CQ simultaneously, and they get completions in FCFS order.
> > > What to do with them is up to the user.
> >
> > Are you going to use this API? How? There is no point in discussing user
> > API without specifying HOW user will be using it. You have to ask what
> > user want and design your API accordingly and not other way around.
> > So suppose I want to use proposed API to implement super scalable MPI.
>
> We'd come up with MPI_Send implementation inside libibverbs:). Think layered - I'd
> like to make a minimal possible API change to make scalability improvements
> possible.
They are not really possible with proposed API (beyond academic papers that is). You are
welcome to implement MPI_Send inside libibverbs. After all this is what Myricom did.
>
> > I setup shared QP/CQ/... and each rank start to post into the QP and
> > receive completion from CQ and suppose rank A picked completion that
> > belongs to rank B so I will need to setup out of band channel to pass
> > this completion from A to B. This is not looks good at all to me.
>
> This is not different from multiple threads sharing a CQ, really - and we do
This is very different from multiple threads sharing a CQ. In
multi threaded scenario I can design my program in a way that each
thread will be able to handle completion. We'll have to pass
completion between processes in the scenario you propose.
> support this already. In the part of the message that you have cut out, I
> showed some use cases that avoid this "side channel"
What? RDMA? What about a completion of RDMA operation? You'll have to
pass it around. I agree that RDMA situation is much better then
send/receive one, but there is no RDMAs without send/recv after it.
> (which could be just shared memory btw).
>
And you introduce another scalability problem here. On a big SMP node
will have to create channel between each pair of processes to pass
completions and will have to poll each one of them besides polling CQ.
Here goes you latency. And I am not saying this is not possible, I am
saying it is so bad that it is not worth doing.
--
Gleb.
More information about the general
mailing list