[dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal

Caitlin Bestler caitlinb at broadcom.com
Tue Feb 7 12:02:53 PST 2006


openib-general-bounces at openib.org wrote:
> Caitlin Bestler wrote:
>> 
>> Arlin Davis wrote:
>>> Sean Hefty wrote:
>>> 
>>>>> The requirement is to provide an API that supports RDMA writes
>>>>> with immediate data.  A send that follows an RDMA write is not
>>>>> immediate data, and the API should not be constructed around
>>>>> trying to make it so. 
>>>>> 
>>>>> 
>>>> 
>>>> To be clear, I believe that write with immediate should be part of
>>>> the normal APIs, rather than an extension, but should be designed
>>>> around those devices that provide it natively.
>>>> 
>>>> 
>>> I totally agree. A standard RDMA write with immediate API can be
>>> very useful to RDMA applications based on the requirements (native
>>> support) set forth in my earlier email. It is analogous to the new
>>> dat_ep_post_send_with_invalidate() call; a call that supports a
>>> native iWARP transport operation but provides no provisions to help
>>> other transports emulate. So, other transports simply return
>>> NOT_SUPPORTED and add it natively in the future if it makes sense.
>>> 
>>> -arlin
>> 
>> What is proposed in a definition of
>> 'dat_ep_post_rdma_write_with_immediate'
>> that can be implemented over iWARP using the sequence of messages
>> that were intended to support the same purpose (i.e., letting the
>> other side know that an RDMA Write transfer has been fully received).
> 
> No, iWARP *CAN NOT* implement write immediate data any better
> than IB can implement send with invalidate.  Immediate data
> *MUST* be indicated to the ULP unambiguously.  Imposing an
> algorithm on the application to infer immediate data arrival
> is hack, pure and simple. An application is free to perform a
> write/send if that is the semantic they want.  Why does iWARP
> get transport unique APIs but not IB?  I find this attempt to
> bastardize the IB semantic of immediate data a little curious.
> 

The transports aren't getting anything. Features are there for
applications, especially when the feature can be defined in a
way that makes sense without explaining transport mechanics.

Completing a transaction, complete with supplying a transaction
response and releasing the advertised STag associated with the
transaction is something that makes sense in the application
domain and conforms to normal DAT ordering rules.

"Provide information about an RDMA Write to a receive operation"
also meets that definition -- as long as it conforms to the
existing ordering rules. Shifting to an 8 byte message over
iWARP to allow for the write length *and* immediate 'tag'
is certainly doable. We could even consider having the
DAT Provider supply the 'buffer' silently in the DTO itself.

With that definition the consumer would get a receive completion
that told them that their peer's RDMA Write had been successfully
placed, how long it is (the length) and which one (a tag).

I think that is of value. iWARP can implement it as two work
requests and maintain the overall semantics.

Are you arguing that iWARP should NOT provide this service
until it can do it in a single work request? It seems to 
me that allowing an extra work request and completion is
a fairly simple accomodation as opposed to using an alternate
algorithm in the main transaction processing of the application.

If we enable the applicatin can query how a remote write
with immediate will complete outside of the transaction loop
then we can allow the application to have *no* overhead inside
the main transaction loop, and *identical* logic on the sending
side.

And IB *could* implement send with invalidate by simply agreeing
on how the RKey to be invalidated is communicated between the
IB providers (perhaps as an immediate).

But more to the point, I don't see how the more flexible
definition of write with immediate negatively impacts the
IB implementation of the feature. IB providers do not need
to allow for the extra work requests. They are not being 
asked to place the immediate data into the receive buffer,
or to do any extra work at all.




More information about the general mailing list