[OFIWG-MPI] [ofiwg] Call today
Howard Pritchard
hppritcha at gmail.com
Tue May 27 07:43:04 PDT 2014
Hi Sean,
Some comments inlined, but first about the idea of the msgq being
instantiated via a method in the
fabric class. I was thinking of the most general case, but I understand
the case for
having it be instantiated via a domain class method. If a vendor supported
a protocol that
internal to the domain could handle multiple rails, that would solve the
problem of tag/rx buffer
matching across multiple rails.
Perhaps that's how mxm and psm work already?
If that's the case, then I'd agree having a msgq object being instantiated
via a domain method would
be more appropriate.
2014-05-22 16:01 GMT-06:00 Hefty, Sean <sean.hefty at intel.com>:
> If I understand your issue, it's not with the send calls being associated
> with an endpoint, only the relationship of a buffer at the target side. I
> would think this applies to all receive buffers and any target buffers of
> an RMA or atomic operation. I didn't follow your asymmetric comment.
>
> RMA target buffers are currently associated with a domain (using libfabric
> terms). Moving that to the fabric level has an implication that a buffer
> may be associated with different access keys. Trying to move receive
> buffers above the domain would have a similar issue.
>
> Attempting to share receive buffers above the domain would likely result
> in synchronization issues between devices and providers in such a way that
> support for zero copy would be compromised. Even SRQ is restricted to a
> single domain.
>
Okay, SRQ example is good reason to have the msgq being "child" of domain.
>
> From an architectural viewpoint, I guess the question is, does it make
> sense that receive buffers never be directly associated with endpoints?
> This seems to be the general decision point that this discussion is
> leading to.
>
I think this needs more discussion. I'd think users of SRQ feature in
ibverbs may have some ideas
about pluses and minuses of associating buffers with endpoints vs the way
RX buffers are posted to
SRQs now with ibverbs.
It might also be interesting to have some PSM and/or MXM gurus provide
insight here. Both of these API's
have a MSGQ like concept. Although it seems just from looking at available
header files, and, for example,
OpenMPI usage of these APIs, that the MSGQ also provides some kind of
communication envelope as well.
Actually for the PSM provider in libfabric, its pretty clear that the MSGQ
is an important concept for that
API.
Howard
> - Sean
>
>
> > No I would not create an RMA class. It seems to me like RMA is a
> transport
> > method, and as
> > such should be a method of the ep class. What I didn't like about the
> > receive buffer method
> > (with or without tags actually since the IB SRQ capability also falls
> under
> > this umbrella) was
> > that it seemed asymmetric. For FID_MSG type EP, with the current
> proposal,
> > there is
> > no ability to allow receive buffers to be posted that could "match"
> > incoming sends from
> > a set of EP's. Here I mean match in the general sense. Only if a vendor
> > supported
> > FID_RDM/FID_DGRAM could this be done as the proposal currently stands.
> >
> > I think I'm restating what Rich said.
> >
> > Howard
> >
> >
> > On Thu, May 22, 2014 at 1:29 PM, Hefty, Sean <sean.hefty at intel.com>
> wrote:
> >
> >
> > Thanks, Howard, this is helpful.
> >
> > Regarding the 'tag match class' that you mention, would you create
> an
> > 'rma class' as a peer, with the RMA operations defined in a similar
> > fashion? If not, why not? Would this also extend to all other data
> > transfer operations? I.e. message queue (send/receive) and atomics, plus
> > any others defined in the future?
> >
> >
> > > -----Original Message-----
> > > From: Howard Pritchard [mailto:hppritcha at gmail.com]
> > > Sent: Thursday, May 22, 2014 11:53 AM
> > > To: Richard Graham
> > > Cc: Hefty, Sean; ofiwg at lists.openfabrics.org; ofiwg-
> > > mpi at lists.openfabrics.org
> > > Subject: Re: [ofiwg] Call today
> > >
> > > Hi Folks,
> > >
> > > here is a diagram of a concept that was discussed in a side
> > conversation at
> > > the last OFA workshop. I'd thought that a msgq (aka tag matcher
> > class)
> > > object
> > > should be instantiated via a method of the fabric class.
> > >
> > > red lines in the diagram indicated the pointee can be associated
> > with the
> > > class
> > > being pointed to by the arrow, using the bind method of the class
> > being
> > > pointed
> > > to.
> > >
> > > the search_by_addr method of the msgq is for use with FID_RDM
> > endpoints,
> > > while search_by_ep method is when the msgq is associated with
> > multipled
> > > FID_MSG type endpoints.
> > >
> > > Note the slide is a little old since the EC class has been
> divided
> > now into
> > > a EQ and counter type completion notification mechanisms.
> > >
> > > Hoping this will maybe help a little here.
> > >
> > > Howard
> > >
> > >
> > >
> > > On Thu, May 22, 2014 at 11:59 AM, Richard Graham
> > <richardg at mellanox.com>
> > > wrote:
> > >
> > >
> > > Please see inline
> > >
> > > -----Original Message-----
> > > From: Hefty, Sean [mailto:sean.hefty at intel.com]
> > > Sent: Thursday, May 22, 2014 12:43 PM
> > > To: Richard Graham; ofiwg at lists.openfabrics.org; ofiwg-
> > > mpi at lists.openfabrics.org
> > > Cc: Paul Grun (grun at cray.com); Liran Liss
> > > Subject: RE: Call today
> > >
> > > With permission, copying mailing list on side thread that
> > popped up.
> > >
> > > I understand MPI has wild card receives. But tagged
> > semantics are
> > > useful even when associated with a generic endpoint concept, or a
> > specific
> > > address. Note the proposed endpoint concept is not necessarily
> > bound to a
> > > specific piece of hardware, though it may be based on the
> provider
> > > implementation. The tagged operations themselves may be
> > implemented by
> > > hardware and are not restricted to being purely a software
> > construct.
> > > [rich] If the attempt here is to provide a building block
> > that will
> > > map to different use-case scenarios, then need to have an
> > architecture that
> > > will map well onto the areas of interest. MPI is just one such
> > upper level
> > > service, one that has been called out specifically in the context
> > of the
> > > proposal you have been presenting. So, following on this (the
> > precise
> > > definition of end point is still rather fuzzy at this stage) in
> > general,
> > > there is no such one-to-one mapping of and endpoint to an MPI
> > matching
> > > context, but there can be an association of a matching context
> with
> > one or
> > > more endpoints. What I am suggesting here is that we keep data
> > notions
> > > around data transfer orthogonal to what is done with the data
> (tag
> > > matching, in this case). How the functionality is implemented
> > (hardware
> > > or not) is separate from how the stack in architected
> > >
> > > Tagged interfaces, as well as other interfaces such as
> > message
> > > queues, may still exist above the endpoint. But that layering of
> > > interfaces seems better suited above the fabric interfaces (e.g.
> > MPI),
> > > rather than included with it. This seems more debatable to me
> > though, and
> > > we could examine whether a domain or fabric object should have
> > send/receive
> > > capabilities.
> > > [rich] Need to keep separate how data is transferred
> (perhaps
> > with
> > > functions that we may call send/recv) from the ULP's use of this
> > data
> > > (perhaps also using the a similar naming scheme of send/recv).
> > >
> > > - Sean
> > >
> > > > -----Original Message-----
> > > > From: Richard Graham [mailto:richardg at mellanox.com]
> > > > Sent: Wednesday, May 21, 2014 11:09 AM
> > > > To: Hefty, Sean
> > > > Cc: Paul Grun (grun at cray.com); Liran Liss
> > > > Subject: RE: Call today
> > > >
> > > > Tag matching as it comes to MPI semantics is not local
> to a
> > given
> > > pair
> > > > of processes, e.g. MPI has a wild card receive that can
> > take data
> > > from
> > > > any source, and therefore the matching context is broader
> > than just
> > > a
> > > > single pair of source and destination.
> > > >
> > > > Rich
> > > >
> > > > -----Original Message-----
> > > > From: Hefty, Sean [mailto:sean.hefty at intel.com]
> > > > Sent: Wednesday, May 21, 2014 1:13 PM
> > > > To: Richard Graham
> > > > Cc: Paul Grun (grun at cray.com); Liran Liss
> > > > Subject: RE: Call today
> > > >
> > > > Tag matching, RMA, atomics, and message operations are
> > currently
> > > > associated with an endpoint, but the functions are
> > independent of
> > > the
> > > > communication protocol in use. Conceptually, it seems
> > reasonable
> > > to
> > > > think of tag matching as a merging of message and RMA
> write
> > > operations.
> > > >
> > > > I agree that an endpoint is associated with the data
> > source/sink.
> > > > There is no implied mapping between a process and an
> > endpoint.
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Richard Graham [mailto:richardg at mellanox.com]
> > > > > Sent: Tuesday, May 20, 2014 9:22 PM
> > > > > To: Hefty, Sean
> > > > > Cc: Paul Grun (grun at cray.com); Liran Liss
> > > > > Subject: RE: Call today
> > > > >
> > > > > I suppose that you could consider tag-matching as part
> of
> > > transport.
> > > > > However, I would argue that such protocols should be
> > independent
> > > of
> > > > > whether or not a reliable or unreliable communication
> > protocol is
> > > > > used
> > > > (at least
> > > > > when it comes to the tag support needed for MPI).
> > Also, I
> > > associate an
> > > > > end-point with either the source and/or the sync of
> data.
> > In MPI
> > > > > tag matching is associated with mpi-level
> > (process,communicator)
> > > > > pair, and therefore the tag-matching context may be
> > associated
> > > with
> > > > > many end-
> > > > points.
> > > > > I would therefore keep tag-matching as a separate
> > concept.
> > > > >
> > > > > Rich
> > > > >
> > > > > -----Original Message-----
> > > > > From: Hefty, Sean [mailto:sean.hefty at intel.com]
> > > > > Sent: Tuesday, May 20, 2014 1:26 PM
> > > > > To: Richard Graham
> > > > > Cc: Paul Grun (grun at cray.com); Liran Liss
> > > > > Subject: RE: Call today
> > > > >
> > > > > Tag-matching is a transport object (protocol), so I do
> > think it
> > > > > makes sense being associated with a transport level
> > object (i.e.
> > > endpoint).
> > > > >
> > > > > I thought you were referring to the SRQ, which may or
> may
> > not be
> > > a
> > > > > transport level object. If the sharing of data
> buffer(s)
> > among
> > > > > multiple connections is not considered a transport
> > object, then I
> > > > > agree, it may make sense to have it be a separate
> object
> > with its
> > > > > own
> > > > interfaces.
> > > > > Alternatively, it could also be a property of endpoints
> > to share
> > > > > receive buffers.
> > > > >
> > > > > When the SRQ appears in the transport object
> (protocol),
> > it may
> > > get
> > > > > more complex.
> > > > >
> > > > > For initial thoughts, sharing receive buffers could be
> > handled
> > > by:
> > > > >
> > > > > 1. Creating an explicit SRQ object as a 'peer' to an
> > endpoint.
> > > SRQ
> > > > > would have the ability to associate receive buffers
> with
> > it.
> > > > > Endpoints would need to be associated with an SRQ to
> make
> > use of
> > > it.
> > > > > 2. Create an SRQ 'endpoint' object. A send-receive
> > endpoint
> > > could
> > > > > be created from and inherent the SRQ interfaces.
> > > > > 3. Add an endpoint property to allow sharing data
> > buffers.
> > > Shared
> > > > > buffers could be posted to a domain object, or,
> > alternatively,
> > > any
> > > > endpoint.
> > > > >
> > > > > Ultimately, the question becomes a matter of where the
> > 'post
> > > receive
> > > > > buffer' operation resides, and the behavior of any
> 'post
> > receive
> > > buffer'
> > > > > call which may reside elsewhere. E.g. SRQ::PostRecv()
> > versus
> > > > > EP::PostRecv(), what is the behavior of EP::PostRecv()
> if
> > buffer
> > > > > sharing is enabled?
> > > > >
> > > > > These assume SRQ as a non-transport object, or at least
> > one that
> > > is
> > > > > not visible to the application.
> > > > >
> > > > >
> > > > >
> > > > > > Liran mentioned that you wanted me to repeat what I
> > said - my
> > > only
> > > > > > comment was that we not couple transport (connection
> > based
> > > > > > transport) with tag- matching (or any other object
> > supported by
> > > > > > the
> > > > library).
> > > > > > These are two different concepts, and should be kept
> > separate.
> > > > > >
> > > > > >
> > > > > >
> > > > > > Rich
> > >
> > > _______________________________________________
> > > ofiwg mailing list
> > > ofiwg at lists.openfabrics.org
> > > http://lists.openfabrics.org/mailman/listinfo/ofiwg
> > >
> > >
> >
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofiwg-mpi/attachments/20140527/5d1dcd5f/attachment.html>
More information about the ofiwg-mpi
mailing list