[dat-discussions] [openib-general] [RFC] DAT 2.0 immediatedataproposal
Kanevsky, Arkady
Arkady.Kanevsky at netapp.com
Mon Feb 6 16:49:47 PST 2006
I am not clear what you are proposing?
A transport specific API?
The current proposal provides on sending side:
single post, and single completion in the error free case.
This is commonality that simplify ULP.
Arkady
Arkady Kanevsky email: arkady at netapp.com
Network Appliance Inc. phone: 781-768-5395
1601 Trapelo Rd. - Suite 16. Fax: 781-895-1195
Waltham, MA 02451 central phone: 781-768-5300
> -----Original Message-----
> From: Larsen, Roy K [mailto:roy.k.larsen at intel.com]
> Sent: Monday, February 06, 2006 6:50 PM
> To: Kanevsky, Arkady; Caitlin Bestler;
> dat-discussions at yahoogroups.com; Sean Hefty
> Cc: openib-general at openib.org
> Subject: RE: [dat-discussions] [openib-general] [RFC] DAT 2.0
> immediatedataproposal
>
>
>
> >From: Kanevsky, Arkady [mailto:Arkady.Kanevsky at netapp.com]
> >Sent: Monday, February 06, 2006 2:27 PM
> >
> >Roy,
> >comments inline.
> >
>
> Mine too....
>
> >>
> >> >From: Kanevsky, Arkady [mailto:Arkady.Kanevsky at netapp.com]
> >> >Roy,
> >> >Can you explain, please?
> >> >
> >> >For IB the operation will be layered properly on Transport
> primitive.
> >> >And on Recv side it will indicate in completion event DTO that it
> >> >matches RDMA Write with Immediate and that Immediate Data is
> >> in event.
> >> >
> >> >For iWARP I expect initially, it will be layered on RDMA
> >> Write followed
> >> >by Send. The Provider can do post more efficiently than
> Consumer and
> >> >guarantee atomicity.
> >> >On Recv side Consumer will get Recv DTO completion in event and
> >> >Immediate Data inline as specified by Provider Attribute.
> >> >
> >> >From the performance point of view Consumers who program
> to IB only
> >> >will have no performance degradation at all. But this API
> >> also allows
> >> >Consumers to write ULP to be transport independent with minimal
> >> >penalty: one binary comparison and extra 4 bytes in recv buffer.
> >>
> >> If the application could be written transport
> independently, I would
> >> have no objection at all. Instead, it must be written in a
> >> transport-adaptive way and to be able to adapt to all possible
> >> implementations, the application could not send arbitrary
> >> "immediate"-sized data as messages because there is no way to
> >> distinguish between them on the receiving side. That is
> HUGE! It is
> >> my experience that send/receive is generally used for
> small messages
> >> and to take away particular message sizes or to depend on
> the so the
> >> application can "adapt" to whatever the immediate size is for a
> >> particular transport, if even needed, is a very weak facility to
> >> offer.
> >
> >But the remote side does posts Recv. Since it anticipate
> that this Recv
> >will be matched against the RDMA Write with immediate it
> posts the recv
> >buffer which fits. Yes, there is an issue for
> Transport-independent ULP
> >that it does needs a buffer.
> >For IB it is possible to post 0-size buffer. But if this is the case
> >Recv end Consumer DOES know that it will be macthed against
> RDMA Write
> >so ULP DOES know what it will be matched against.
> >So in the worst case Consumer does have to pay the price of creating
> >LMR to handle 4 byte buffer to match RDMA Write Immediate data.
>
> I think you missed my larger point. The point was that the
> application must be written in such a way that it could
> inferred when immediate data arrived for a variety of
> immediate data sizes and that places a constraint on the
> application wrt to data it may want to send/receive normally.
> Where as, if the application embraced the fact that it was
> responsible for sending a message to indicate a write
> completion, it is free to send whatever amount of data best
> met its needs.
>
> Transports that support true immediate data do not require
> the ULP to perform buffer matching. They can post a series
> of receive buffers that may or may not indicate immediate
> data. The ULP does not have to know ahead of time when
> immediate data will arrive **against other data receives**.
> The fact that an IB oriented application never needs to back
> a receive request with a buffer if they were only used to
> indicate immediate data is orthogonal.
>
> >
> >>
> >> It also affects interface resource allocation. Send queue
> sizes will
> >> have to adapt to possibly twice there size.
> >>
> >
> >That is correct. We argued about it at the meeting.
> >One alternative is to have EP and EVD attr. But this will not be
> >efficient since it will double the queue size where a
> smaller increment
> >is possible due to the depth of the RDMA Write pipeline outstanding.
> >
> >> It just dawned on me that the immediate data must be in registered
> >> memory to be sent in a message. This means the API must
> be amended
> >> to pass an LMR or, even worse, the provider would have to register
> >> memory in the speed path or create and manipulate its own queue of
> >> "immediate"
> >> data buffers/LMRs. Of course, LMRs are not needed and an overhead
> >> for transports that provide true immediate data.
> >
> >No registration on the speed path. It is Consumer responsibility to
> >provide Recv Buffer of the right size.
> >Yes for IB only ULP this can be avoided.
> >But ULP can be written to the proposed API to take full
> advantage of IB
> >performance but that code will not be transport independent.
>
> I was referring to the sending side. Source data of a
> message send must be from registered memory. For transports
> that will emulate this service with a write/send sequence,
> user specified immediate data will need to be copied to a
> provider managed pool of "immediate" data buffers/LMRs or the
> interface changed to specify an LMR.
>
> >
> >But this API allows to write transport independent code albeit with
> >certain price attached.
> >
> >>
> >> Oh, and another thing. InfiniBand indicates the size of the RDMA
> >> write in the receive completion. That is something that
> will have to
> >> be addressed in a "transport independent" way or dropped
> as part of
> >> the service.
> >
> >Good point. I will augment Spec accordingly.
> >
> >>
> >> The bottom line here is that it is NOT transport independent.
> >
> >implementation is not transport independent.
> >But API allows to write Transport-specific ULP with full
> perfromance as
> >well Transport-independent ULP with better performance than without
> >proposed API and with "minimal" performance penalty for
> Transports that
> >provide it.
>
> Of course, you can make the application as transport service
> adaptive as you want but that is a weak argument and a
> slippery slop. My point is that the operational semantics of
> non-native immediate data transports are identical to
> write/send in all respects. So, embrace this and just give
> the ULP a simple interface that has broader applicability for
> all transports. Provide a thread atomic combined request
> capability which can be used for write completion
> notification (if not natively
> supported) or any other purpose an application may fancy.
>
> >
> >>
> >> Now, the atomicity argument between write and send has some
> >> credibility.
> >> If an application chooses to "adapt" to an explicit write/send
> >> semantic for write completion notification in environments
> that can't
> >> provide it natively, this could be addressed by a generalized
> >> combined request API that can guarantee thread-based
> atomicity to the
> >> send queue. This seems much more straightforward to me since, in
> >> essence, to adapt to non-native immediate data services,
> they would
> >> have to allocate resources and behave in virtually the
> same way as if
> >> they did write/send explicitly.
> >>
> >> It is obvious that the proposed service is not one of
> immediate data
> >> in the sense defined by InfiniBand. Since true immediate
> data is a
> >> transport specific speed path service, it needs to be
> implemented as
> >> a transport specific extension. To allow an application
> to initiate
> >> multiple request sequences that must be queued sequentially to
> >> explicitly create a write completion notification or any other
> >> order-based sequence, a generalized combined request API should be
> >> defined.
> >
> >
> >No disagreemnt here. We were debating a generic way to
> combine multiple
> >DTOs into a single call for some time.
> >But how to define a generic way to do it and to have a single
> completion
> >on both ends of the connection in successful case was always
> a problem.
>
> I would think an array of pointers and a count to standard
> work requests would do it. And of course, each work request
> can control whether is solicits a completion so a write/send
> sequence can generate a single completion event on both ends.
> Use the EVD lock to guard against other threads injecting
> requests on the queue during a combined request operation and
> the ULP has everything it needs.
>
> Roy
>
> >
> >>
> >> >
> >> >Arkady Kanevsky email: arkady at netapp.com
> >> >Network Appliance Inc. phone: 781-768-5395
> >> >1601 Trapelo Rd. - Suite 16. Fax: 781-895-1195
> >> >Waltham, MA 02451 central phone: 781-768-5300
> >> >
> >> >
> >>
>
More information about the general
mailing list