[ofiwg] Call today

Hefty, Sean sean.hefty at intel.com
Thu May 22 15:01:02 PDT 2014


If I understand your issue, it's not with the send calls being associated with an endpoint, only the relationship of a buffer at the target side.  I would think this applies to all receive buffers and any target buffers of an RMA or atomic operation.  I didn't follow your asymmetric comment.

RMA target buffers are currently associated with a domain (using libfabric terms).  Moving that to the fabric level has an implication that a buffer may be associated with different access keys.  Trying to move receive buffers above the domain would have a similar issue.

Attempting to share receive buffers above the domain would likely result in synchronization issues between devices and providers in such a way that support for zero copy would be compromised.  Even SRQ is restricted to a single domain.

From an architectural viewpoint, I guess the question is, does it make sense that receive buffers never be directly associated with endpoints?  This seems to be the general decision point that this discussion is leading to.

- Sean


> No I would not create an RMA class.  It seems to me like RMA is a transport
> method, and as
> such should be a method of the ep class.  What I didn't like about the
> receive buffer method
> (with or without tags actually since the IB SRQ capability also falls under
> this umbrella) was
> that it seemed asymmetric.  For FID_MSG type EP, with the current proposal,
> there is
> no ability to allow receive buffers to be posted that could "match"
> incoming sends from
> a set of EP's.  Here I mean match in the general sense.  Only if a vendor
> supported
> FID_RDM/FID_DGRAM could this be done as the proposal currently stands.
> 
> I think I'm restating what Rich said.
> 
> Howard
> 
> 
> On Thu, May 22, 2014 at 1:29 PM, Hefty, Sean <sean.hefty at intel.com> wrote:
> 
> 
> 	Thanks, Howard, this is helpful.
> 
> 	Regarding the 'tag match class' that you mention, would you create an
> 'rma class' as a peer, with the RMA operations defined in a similar
> fashion?  If not, why not?  Would this also extend to all other data
> transfer operations?  I.e. message queue (send/receive) and atomics, plus
> any others defined in the future?
> 
> 
> 	> -----Original Message-----
> 	> From: Howard Pritchard [mailto:hppritcha at gmail.com]
> 	> Sent: Thursday, May 22, 2014 11:53 AM
> 	> To: Richard Graham
> 	> Cc: Hefty, Sean; ofiwg at lists.openfabrics.org; ofiwg-
> 	> mpi at lists.openfabrics.org
> 	> Subject: Re: [ofiwg] Call today
> 	>
> 	> Hi Folks,
> 	>
> 	> here is a diagram of a concept that was discussed in a side
> conversation at
> 	> the last OFA workshop.  I'd thought that a msgq (aka tag matcher
> class)
> 	> object
> 	> should be instantiated via a method of the fabric class.
> 	>
> 	> red lines in the diagram indicated the pointee can be associated
> with the
> 	> class
> 	> being pointed to by the arrow, using the bind method of the class
> being
> 	> pointed
> 	> to.
> 	>
> 	> the search_by_addr method of the msgq is for use with FID_RDM
> endpoints,
> 	> while search_by_ep method is when the msgq is associated with
> multipled
> 	> FID_MSG type endpoints.
> 	>
> 	> Note the slide is a little old since the EC class has been divided
> now into
> 	> a EQ and counter type completion notification mechanisms.
> 	>
> 	> Hoping this will maybe help a little here.
> 	>
> 	> Howard
> 	>
> 	>
> 	>
> 	> On Thu, May 22, 2014 at 11:59 AM, Richard Graham
> <richardg at mellanox.com>
> 	> wrote:
> 	>
> 	>
> 	>       Please see inline
> 	>
> 	>       -----Original Message-----
> 	>       From: Hefty, Sean [mailto:sean.hefty at intel.com]
> 	>       Sent: Thursday, May 22, 2014 12:43 PM
> 	>       To: Richard Graham; ofiwg at lists.openfabrics.org; ofiwg-
> 	> mpi at lists.openfabrics.org
> 	>       Cc: Paul Grun (grun at cray.com); Liran Liss
> 	>       Subject: RE: Call today
> 	>
> 	>       With permission, copying mailing list on side thread that
> popped up.
> 	>
> 	>       I understand MPI has wild card receives.  But tagged
> semantics are
> 	> useful even when associated with a generic endpoint concept, or a
> specific
> 	> address.  Note the proposed endpoint concept is not necessarily
> bound to a
> 	> specific piece of hardware, though it may be based on the provider
> 	> implementation.  The tagged operations themselves may be
> implemented by
> 	> hardware and are not restricted to being purely a software
> construct.
> 	>       [rich]  If the attempt here is to provide a building block
> that will
> 	> map to different use-case scenarios, then need to have an
> architecture that
> 	> will map well onto the areas of interest.  MPI is just one such
> upper level
> 	> service, one that has been called out specifically in the context
> of the
> 	> proposal you have been presenting.  So, following on this (the
> precise
> 	> definition of end point is still rather fuzzy at this stage) in
> general,
> 	> there is no such one-to-one mapping of and endpoint to an MPI
> matching
> 	> context, but there can be an association of a matching context with
> one or
> 	> more endpoints.  What I am suggesting here is that we keep data
> notions
> 	> around data transfer orthogonal to what is done with the data (tag
> 	> matching, in this case).  How the functionality is implemented
> (hardware
> 	> or not) is separate from how the stack in architected
> 	>
> 	>       Tagged interfaces, as well as other interfaces such as
> message
> 	> queues, may still exist above the endpoint.  But that layering of
> 	> interfaces seems better suited above the fabric interfaces (e.g.
> MPI),
> 	> rather than included with it.  This seems more debatable to me
> though, and
> 	> we could examine whether a domain or fabric object should have
> send/receive
> 	> capabilities.
> 	>       [rich] Need to keep separate how data is transferred (perhaps
> with
> 	> functions that we may call send/recv) from the ULP's use of this
> data
> 	> (perhaps also using the a similar naming scheme of send/recv).
> 	>
> 	>       - Sean
> 	>
> 	>       > -----Original Message-----
> 	>       > From: Richard Graham [mailto:richardg at mellanox.com]
> 	>       > Sent: Wednesday, May 21, 2014 11:09 AM
> 	>       > To: Hefty, Sean
> 	>       > Cc: Paul Grun (grun at cray.com); Liran Liss
> 	>       > Subject: RE: Call today
> 	>       >
> 	>       > Tag matching as it comes to MPI semantics is not local to a
> given
> 	> pair
> 	>       > of processes, e.g. MPI has a wild card receive that can
> take data
> 	> from
> 	>       > any source, and therefore the matching context is broader
> than just
> 	> a
> 	>       > single pair of source and destination.
> 	>       >
> 	>       > Rich
> 	>       >
> 	>       > -----Original Message-----
> 	>       > From: Hefty, Sean [mailto:sean.hefty at intel.com]
> 	>       > Sent: Wednesday, May 21, 2014 1:13 PM
> 	>       > To: Richard Graham
> 	>       > Cc: Paul Grun (grun at cray.com); Liran Liss
> 	>       > Subject: RE: Call today
> 	>       >
> 	>       > Tag matching, RMA, atomics, and message operations are
> currently
> 	>       > associated with an endpoint, but the functions are
> independent of
> 	> the
> 	>       > communication protocol in use.  Conceptually, it seems
> reasonable
> 	> to
> 	>       > think of tag matching as a merging of message and RMA write
> 	> operations.
> 	>       >
> 	>       > I agree that an endpoint is associated with the data
> source/sink.
> 	>       > There is no implied mapping between a process and an
> endpoint.
> 	>       >
> 	>       >
> 	>       > > -----Original Message-----
> 	>       > > From: Richard Graham [mailto:richardg at mellanox.com]
> 	>       > > Sent: Tuesday, May 20, 2014 9:22 PM
> 	>       > > To: Hefty, Sean
> 	>       > > Cc: Paul Grun (grun at cray.com); Liran Liss
> 	>       > > Subject: RE: Call today
> 	>       > >
> 	>       > > I suppose that you could consider tag-matching as part of
> 	> transport.
> 	>       > > However, I would argue that such protocols should be
> independent
> 	> of
> 	>       > > whether or not a reliable or unreliable communication
> protocol is
> 	>       > > used
> 	>       > (at least
> 	>       > > when it comes to the tag support needed for MPI).
> Also, I
> 	> associate an
> 	>       > > end-point with either the source and/or the sync of data.
> In MPI
> 	>       > > tag matching is associated with mpi-level
> (process,communicator)
> 	>       > > pair, and therefore the tag-matching context may be
> associated
> 	> with
> 	>       > > many end-
> 	>       > points.
> 	>       > > I would therefore keep tag-matching as a separate
> concept.
> 	>       > >
> 	>       > > Rich
> 	>       > >
> 	>       > > -----Original Message-----
> 	>       > > From: Hefty, Sean [mailto:sean.hefty at intel.com]
> 	>       > > Sent: Tuesday, May 20, 2014 1:26 PM
> 	>       > > To: Richard Graham
> 	>       > > Cc: Paul Grun (grun at cray.com); Liran Liss
> 	>       > > Subject: RE: Call today
> 	>       > >
> 	>       > > Tag-matching is a transport object (protocol), so I do
> think it
> 	>       > > makes sense being associated with a transport level
> object (i.e.
> 	> endpoint).
> 	>       > >
> 	>       > > I thought you were referring to the SRQ, which may or may
> not be
> 	> a
> 	>       > > transport level object.  If the sharing of data buffer(s)
> among
> 	>       > > multiple connections is not considered a transport
> object, then I
> 	>       > > agree, it may make sense to have it be a separate object
> with its
> 	>       > > own
> 	>       > interfaces.
> 	>       > > Alternatively, it could also be a property of endpoints
> to share
> 	>       > > receive buffers.
> 	>       > >
> 	>       > > When the SRQ appears in the transport object (protocol),
> it may
> 	> get
> 	>       > > more complex.
> 	>       > >
> 	>       > > For initial thoughts, sharing receive buffers could be
> handled
> 	> by:
> 	>       > >
> 	>       > > 1. Creating an explicit SRQ object as a 'peer' to an
> endpoint.
> 	> SRQ
> 	>       > > would have the ability to associate receive buffers with
> it.
> 	>       > > Endpoints would need to be associated with an SRQ to make
> use of
> 	> it.
> 	>       > > 2. Create an SRQ 'endpoint' object.  A send-receive
> endpoint
> 	> could
> 	>       > > be created from and inherent the SRQ interfaces.
> 	>       > > 3. Add an endpoint property to allow sharing data
> buffers.
> 	> Shared
> 	>       > > buffers could be posted to a domain object, or,
> alternatively,
> 	> any
> 	>       > endpoint.
> 	>       > >
> 	>       > > Ultimately, the question becomes a matter of where the
> 'post
> 	> receive
> 	>       > > buffer' operation resides, and the behavior of any 'post
> receive
> 	> buffer'
> 	>       > > call which may reside elsewhere.  E.g. SRQ::PostRecv()
> versus
> 	>       > > EP::PostRecv(), what is the behavior of EP::PostRecv() if
> buffer
> 	>       > > sharing is enabled?
> 	>       > >
> 	>       > > These assume SRQ as a non-transport object, or at least
> one that
> 	> is
> 	>       > > not visible to the application.
> 	>       > >
> 	>       > >
> 	>       > >
> 	>       > > > Liran mentioned that you wanted me to repeat what I
> said - my
> 	> only
> 	>       > > > comment was that we not couple transport (connection
> based
> 	>       > > > transport) with tag- matching (or any other object
> supported by
> 	>       > > > the
> 	>       > library).
> 	>       > > > These are two different concepts, and should be kept
> separate.
> 	>       > > >
> 	>       > > >
> 	>       > > >
> 	>       > > > Rich
> 	>
> 	>       _______________________________________________
> 	>       ofiwg mailing list
> 	>       ofiwg at lists.openfabrics.org
> 	>       http://lists.openfabrics.org/mailman/listinfo/ofiwg
> 	>
> 	>
> 
> 
> 



More information about the ofiwg mailing list