[openib-general] basic IB doubt
krause at cup.hp.com
Mon Aug 28 20:18:19 PDT 2006
At 10:14 AM 8/23/2006, Ralph Campbell wrote:
>On Wed, 2006-08-23 at 09:47 -0700, Caitlin Bestler wrote:
> > openib-general-bounces at openib.org wrote:
> > > Quoting r. john t <johnt1johnt2 at gmail.com>:
> > >> Subject: basic IB doubt
> > >>
> > >> Hi
> > >>
> > >> I have a very basic doubt. Suppose Host A is doing RDMA write (say 8
> > >> MB) to Host B. When data is copied into Host B's local
> > > buffer, is it guaranteed that data will be copied starting
> > > from the first location (first buffer address) to the last
> > > location (last buffer address)? or it could be in any order?
> > >
> > > Once B gets a completion (e.g. of a subsequent send), data in
> > > its buffer matches that of A, byte for byte.
> > An excellent and concise answer. That is exactly what the application
> > should rely upon, and nothing else. With iWARP this is very explicit,
> > because portions of the message not only MAY be placed out of
> > order, they SHOULD be when packets have been re-ordered by the
> > network. But for *any* RDMA adapter there is no guarantee on
> > what order the adapter flushes things to host memory or particularly
> > when old contents that may be cached are invalidated or updated.
> > The role of the completion is to limit the frequency with which
> > the RDMA adapter MUST guarantee coherency with application visible
> > buffers. The completion not only indicates that the entire message
> > was received, but that it has been entirely delivered to host memory.
>Actually, A knows the data is in B's memory when A gets the completion
This is incorrect for both iWARP and IB. A completion by A only means that
the receiving HCA / RNIC has the data and has generated an
acknowledgement. It does not indicate that B has flushed the data to host
memory. Hence, the fault zone remains the HCA / RNIC and while A may free
the associated buffer for other usage, it should not rely upon the data
being delivered to host memory on B. This is one of the fault scenarios I
raised during the initial RDS transparent recovery assertions. If A were
to issue a RDMA Read to the B targeting the associated RDMA Write memory
location, then it can know the data has been placed in B's memory.
> B can't rely on anything unless A uses the RDMA write with
>immediate which puts a completion event in B's CQ.
>Most applications on B ignore this requirement and test for the last
>memory location being modified which usually works but doesn't
>guarantee that all the data is in memory.
B cannot rely on anything until a completion is seen either through an
immediate or a subsequent Send. It is not wise to rely upon IHV-specific
behaviors when designing an application as even an IHV can change things
over time or due to interoperability requirements, things may not work as
desired which is definitely a customer complaint that many would like to avoid.
BTW, the reason immediate data is 4 bytes in length is that was what was
defined in VIA. Many within the IBTA wanted to get rid of immediate data
but due to the requirement to support legacy VIA applications, the
immediate value was left in place. The need to support a larger value was
not apparent. One needs to keep in mind where the immediate resides within
the wire protocol and its usage model. The past usage was to signal a PID
or some other unique identifier that could be used to comprehend which
thread of execution should be informed of a particular completion
event. Four bytes is sufficient to communicate such information without
significantly complicating or making the wire protocol too inefficient.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the general