[dat-discussions] [openib-general] [RFC] DAT 2.0 immediatedataproposal

Mon Feb 6 13:25:12 PST 2006

>From: Kanevsky, Arkady [mailto:Arkady.Kanevsky at netapp.com]
>Roy,
>Can you explain, please?
>
>For IB the operation will be layered properly on Transport primitive.
>And on Recv side it will indicate in completion event DTO
>that it matches RDMA Write with Immediate and that Immediate Data
>is in event.
>
>For iWARP I expect initially, it will be layered on RDMA Write
>followed by Send. The Provider can do post more efficiently
>than Consumer and guarantee atomicity.
>On Recv side Consumer will get Recv DTO completion in event
>and Immediate Data inline as specified by Provider Attribute.
>
>From the performance point of view Consumers who program to IB
>only will have no performance degradation at all. But this API also
>allows Consumers to write ULP to be transport independent
>with minimal penalty: one binary comparison and extra 4 bytes in recv
>buffer.

If the application could be written transport independently, I would
have no objection at all.  Instead, it must be written in a
transport-adaptive way and to be able to adapt to all possible
implementations, the application could not send arbitrary
"immediate"-sized data as messages because there is no way to
distinguish between them on the receiving side.  That is HUGE!  It is my
experience that send/receive is generally used for small messages and to
take away particular message sizes or to depend on the so the
application can "adapt" to whatever the immediate size is for a
particular transport, if even needed, is a very weak facility to offer.

It also affects interface resource allocation.  Send queue sizes will
have to adapt to possibly twice there size.

It just dawned on me that the immediate data must be in registered
memory to be sent in a message.  This means the API must be amended to
pass an LMR or, even worse, the provider would have to register memory
in the speed path or create and manipulate its own queue of "immediate"
data buffers/LMRs.  Of course, LMRs are not needed and an overhead for
transports that provide true immediate data.

Oh, and another thing.  InfiniBand indicates the size of the RDMA write
in the receive completion.  That is something that will have to be
addressed in a "transport independent" way or dropped as part of the
service.

The bottom line here is that it is NOT transport independent. 

Now, the atomicity argument between write and send has some credibility.
If an application chooses to "adapt" to an explicit write/send semantic
for write completion notification in environments that can't provide it
natively, this could be addressed by a generalized combined request API
that can guarantee thread-based atomicity to the send queue.  This seems
much more straightforward to me since, in essence, to adapt to
non-native immediate data services, they would have to allocate
resources and behave in virtually the same way as if they did write/send
explicitly. 

It is obvious that the proposed service is not one of immediate data in
the sense defined by InfiniBand.  Since true immediate data is a
transport specific speed path service, it needs to be implemented as a
transport specific extension.  To allow an application to initiate
multiple request sequences that must be queued sequentially to
explicitly create a write completion notification or any other
order-based sequence, a generalized combined request API should be
defined.

>
>Arkady Kanevsky                       email: arkady at netapp.com
>Network Appliance Inc.               phone: 781-768-5395
>1601 Trapelo Rd. - Suite 16.        Fax: 781-895-1195
>Waltham, MA 02451                   central phone: 781-768-5300
>
>