[ofa-general] RE: Re: [PATCH RFC] sharing userspace IB objects

Sun Jul 1 09:27:24 PDT 2007

> -----Original Message-----
> From: general-bounces at lists.openfabrics.org 
> [mailto:general-bounces at lists.openfabrics.org] On Behalf Of 
> Gleb Natapov
> Sent: Sunday, July 01, 2007 3:16 PM
> To: Michael S. Tsirkin
> Cc: Roland Dreier; gdror at lists.openfabrics.org; 
> openib-general at openib.org
> Subject: Re: Re: [PATCH RFC] sharing userspace IB objects
> 
> On Sun, Jul 01, 2007 at 01:05:30AM +0300, Michael S. Tsirkin wrote:
> > > Quoting Roland Dreier <rdreier at cisco.com>:
> > > Subject: Re: [PATCH RFC] sharing userspace IB objects
> > > 
> > >  > This is not directly related to SRC: this is an effort 
>  > to make 
> > > it possible to share QPs, CQ etc across processes  > in 
> the same way 
> > > as they can be currently shared across threads.
> > >  > So assuming that we want multiple processes to post to  > the 
> > > same QP, how can we support this?
> > > 
> > > This looks like a lot of work for an unknown gain.  Who 
> is going to 
> > > really use this?  ie is it worth the trouble?
> > 
> > I think Dror is the best person to answer this.
> > Dror, could you please explain the need for shared send queue?
> > 
> SSQ is needed for scalability, no need to explain this (by 
> the way RD is needed for the same reason too. What's Mellanox 
> plan to support it?

RD is not supported in hardware today. Implementing RD is extremely 
complicated. To solve the scalability issues on MPI like applications
we believe that SRC and SSQ are the right solutions. It is much simpler
for implementation by both software and hardware. By MPI-like I refer
to applications that have some level of trust between two processes of
the
same application. RD also has some performance issues as it only 
supports one message in the air. Those performance issues are solved
by design in SRC/SSQ.

> It is a part of Spec after all, so why to invent new shiny 
> staff when it is still possible to achieve better scalability 
> without them).

It's truly about complexity. And as I mentioned in OFA meeting at
Sonoma, 
Mellanox is willing to contribute SRC/SSQ to the IB spec as well.

> We are discussing you implementation proposal and in my 
> opinion it doesn't fit application needs. I may be wrong 
> here, so if there is somebody who things that sending random 
> completion to random processes it the best idea ever and 
> absence of this "feature" is the only thing that stops him 
> from IB adoption he may chime in here and voice his opinion.

Your input about how to demultiplex send completions on SSQ is 
valuable. Unfortunately it is not supported in the current generation.
What I can suggest here is, not new on this thread, but:
1) all pollers see the same CQ, only the poller that sees the completion
that
      belongs to takes it out of the CQ
2) only one process polls the CQ, if it doesn't belong to the poller,
the
      poller will put it in a SW queue to the right process. The other 
      processes just poll on the SW queue
3) the SQ will have a "completed WQE index" reported. Everybody can
     look at it and determine how many WQEs completed. This one has
     some cons because the CQ is not shared here... need to bake this 
     one more.
If we wrap one of these into the right API, once there is HW available
that 
can do the SSQ CQ demultiplexing, it can work without any API change. 

> 
> Looking at the Dror's slides on slide 6 "Scalable Reliable 
> Connection" I see that wire protocol is extended to send DST 
> SRQ as part of a header.
> Receiver side then puts completion to appropriate CQ 
> according this field. Have you proposition address this? How? 

SRC indeed includes demultiplexing of the CQ. SSQ does not currently,
unfortunately.
But I think that with the right API we can abstract this, and later on
have better performance for it.

> Who will put this additional data on a wire (HW or libibverbs 
> may be app)? Also I don't see this in Dror's slide, but 
> completion of local operation should be demultiplexed to 
> appropriate CQ too. WQE may contain additional field, for 
> instance, that will tell where to put a completion. Once 
> again who will do the demux in you proposition (HW, libiverbs 
> or app)? The right answer is most certainly HW in both cases 
> so will Hermon support this?
> Or may be you want to demultiplex everything inside 
> libibvers? In this case I want to see design of this 
> (preferably with performance analysis).

One thing to mention. The way I see it is according to the order of the
slides. First get SRC going, improve the scalability. Then SSQ can be
added to further improve scalability. In other words I am suggesting
that maybe we can worry with the SSQ deficiencies a bit later :)

> 
> --
> 			Gleb.
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit 
> http://openib.org/mailman/listinfo/openib-general
> 
>