[dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal
Caitlin Bestler
caitlinb at broadcom.com
Wed Feb 8 22:07:41 PST 2006
openib-general-bounces at openib.org wrote:
> Arlin> A very latency sensitive application that requires
> Arlin> immediate notification of RDMA write completion on the
> Arlin> remote node without ANY latency penalties associated with
> Arlin> combining operations, HCA priority rules across QPs, wire
> Arlin> congestion, etc. An application that has no requirement for
> Arlin> messaging outside of remote rdma write completion
> Arlin> notifications. The application would not have to register
> Arlin> and manage additional message buffers on either side, we
> Arlin> can just size the queues accordingly and post zero byte
> Arlin> messages. We need something that would be equivelent to
> Arlin> setting there polling on the last byte of inbound
> Arlin> data. But, since data ordering within an operation is not
> Arlin> guaranteed that is not an option. So, rdma with immediate
> Arlin> data is the most optimal and simplistic method for
> Arlin> indication of RDMA-write completion that we have available
> Arlin> today. In fact, I would like to see it increased in size to
> Arlin> make it even more useful.
>
> Hmm. Can you put a number on how much better RDMA write with
> immediate is on current HCA hardware? How does using the
> underlying OpenIB verbs ability to post a list of work
> requests compare (ie posting an RDMA write followed by a send
> in one verbs call)?
> Maybe "post multiple" is a better direction for DAT.
>
The distinction between "Write and Send" versus "post multiple"
is that it maintains a very simple one-to-one correspondence
with the post_recv at the data sink.
I also do not see how the *application* keeping the "write and send"
semantics can have a negative performance implication if we allow
InfiniBand Providers to encode it as an RDMA Write with Immediate.
If the Data Source needs to communicate to the Data Sink that
a specific RDMA Write transfer is done then it is sending a
message. Information transfer and synchronization is occuring.
I fail to see the value, let alone the optimization, of layering
on an extra bit of information disguised as an opcode and using
a specific transport's encoding methods as the model for a transport
neutral API (particularly one at the DAT layer, at the verb layer
it is a different issue because at the verb layer we do not want
to hide any hardware capabilities even while encouraging safe
harbor transport neutral practices).
If distinquishing between 32-bit messages and 32-bit immediates
that can arrive in indeterminate order is really that important
to your application then maybe you really needed a 33-bit message
to begin with. Encoding application layer information via your
choice of carrier pigeon is not a very robust strategy.
More information about the general
mailing list