[dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal

Caitlin Bestler caitlinb at broadcom.com
Wed Feb 8 22:07:41 PST 2006


openib-general-bounces at openib.org wrote:
>     Arlin> A very latency sensitive application that requires
>     Arlin> immediate notification of RDMA write completion on the
>     Arlin> remote node without ANY latency penalties associated with
>     Arlin> combining operations, HCA priority rules across QPs, wire
>     Arlin> congestion, etc. An application that has no requirement for
>     Arlin> messaging outside of remote rdma write completion
>     Arlin> notifications. The application would not have to register
>     Arlin> and manage additional message buffers on either side, we
>     Arlin> can just size the queues accordingly and post zero byte
>     Arlin> messages. We need something that would be equivelent to
>     Arlin> setting there polling on the last byte of inbound
>     Arlin> data. But, since data ordering within an operation is not
>     Arlin> guaranteed that is not an option. So, rdma with immediate
>     Arlin> data is the most optimal and simplistic method for
>     Arlin> indication of RDMA-write completion that we have available
>     Arlin> today. In fact, I would like to see it increased in size to
>     Arlin> make it even more useful.
> 
> Hmm.  Can you put a number on how much better RDMA write with
> immediate is on current HCA hardware?  How does using the
> underlying OpenIB verbs ability to post a list of work
> requests compare (ie posting an RDMA write followed by a send
> in one verbs call)?
> Maybe "post multiple" is a better direction for DAT.
> 

The distinction between "Write and Send" versus "post multiple"
is that it maintains a very simple one-to-one correspondence
with the post_recv at the data sink.

I also do not see how the *application* keeping the "write and send"
semantics can have a negative performance implication if we allow
InfiniBand Providers to encode it as an RDMA Write with Immediate.

If the Data Source needs to communicate to the Data Sink that
a specific RDMA Write transfer is done then it is sending a
message. Information transfer and synchronization is occuring.

I fail to see the value, let alone the optimization, of layering
on an extra bit of information disguised as an opcode and using
a specific transport's encoding methods as the model for a transport
neutral API (particularly one at the DAT layer, at the verb layer
it is a different issue because at the verb layer we do not want
to hide any hardware capabilities even while encouraging safe
harbor transport neutral practices).

If distinquishing between 32-bit messages and 32-bit immediates
that can arrive in indeterminate order is really that important
to your application then maybe you really needed a 33-bit message
to begin with. Encoding application layer information via your
choice of carrier pigeon is not a very robust strategy.




More information about the general mailing list