[ofa-general] Re: Re: [PATCH RFC] sharing userspace IB objects

Gleb Natapov glebn at voltaire.com
Sun Jul 1 09:36:15 PDT 2007


On Sun, Jul 01, 2007 at 05:08:08PM +0300, Michael S. Tsirkin wrote:
> > Looking at the Dror's slides on slide 6 "Scalable Reliable Connection" I
> > see that wire protocol is extended to send DST SRQ as part of a header.
> > Receiver side then puts completion to appropriate CQ according this
> > field. Have you proposition address this? How? Who will put this
> > additional data on a wire (HW or libibverbs may be app)?
> 
> This is SRC, which is a hardware extension, and is mostly an orthogonal issue.
I don't agree. You don't usually create QP only for sends. And indeed if
we look at slide 8 "Shared Send Queue" we see that demultiplexing of
receive and additional header field are there. Also slide 11 defines SSQ
API on top of SRC API and it make perfect sense. I don't see anywhere in
this slides that SSQ is mentioned on its own without SRC.

> My proposal only deals with SSQ for now.
> For SRC we'll need to define a new "SRC domain" objects and API to share them
> between apps. I expect that we'll be able to basically use the same API as for
> sharing other objects.
So lack of HW support for SRC stops you from implementing it, but lack
of HW support for SSQ don't really bother you at all.

> 
> It is true that for best scalability we probably need both SSQ and SRC,
> but let's try to focus on sharing APIs for now.
Sharing API is small and boring detail. We need to understand application need
and design to it.

> 
> > Also I don't see this in Dror's slide, but completion of local operation should
> > be demultiplexed to appropriate CQ too. WQE may contain additional field, for
> > instance, that will tell where to put a completion. Once again who will do the
> > demux in you proposition (HW, libiverbs or app)? The right answer is most
> > certainly HW in both cases so will Hermon support this?  Or may be you want to
> > demultiplex everything inside libibvers? In this case I want to see design of
> > this (preferably with performance analysis).
> 
> Since hardware can not do this demultiplexing, I think the right thing
> is to do this inside MPI, encoding the necessary data in the WRID field.
> 
It translates to: "Marketing wants new TLAs to be implemented fast. We don't
have HW support for that so we implement something to get rid of
marketing guys and the rest is not our problem and you MPI folk go deal
with that mess (you already used to it anyway)"

--
			Gleb.



More information about the general mailing list