[openib-general] basic IB doubt

Mon Aug 28 20:18:19 PDT 2006

At 10:14 AM 8/23/2006, Ralph Campbell wrote:
>On Wed, 2006-08-23 at 09:47 -0700, Caitlin Bestler wrote:
> > openib-general-bounces at openib.org wrote:
> > > Quoting r. john t <johnt1johnt2 at gmail.com>:
> > >> Subject: basic IB doubt
> > >>
> > >> Hi
> > >>
> > >> I have a very basic doubt. Suppose Host A is doing RDMA write (say 8
> > >> MB) to Host B. When data is copied into Host B's local
> > > buffer, is it guaranteed that data will be copied starting
> > > from the first location (first buffer address) to the last
> > > location (last buffer address)? or it could be in any order?
> > >
> > > Once B gets a completion (e.g. of a subsequent send), data in
> > > its buffer matches that of A, byte for byte.
> >
> > An excellent and concise answer. That is exactly what the application
> > should rely upon, and nothing else. With iWARP this is very explicit,
> > because portions of the message not only MAY be placed out of
> > order, they SHOULD be when packets have been re-ordered by the
> > network. But for *any* RDMA adapter there is no guarantee on
> > what order the adapter flushes things to host memory or particularly
> > when old contents that may be cached are invalidated or updated.
> > The role of the completion is to limit the frequency with which
> > the RDMA adapter MUST guarantee coherency with application visible
> > buffers. The completion not only indicates that the entire message
> > was received, but that it has been entirely delivered to host memory.
>
>Actually, A knows the data is in B's memory when A gets the completion
>notice.

This is incorrect for both iWARP and IB.  A completion by A only means that 
the receiving HCA / RNIC has the data and has generated an 
acknowledgement.  It does not indicate that B has flushed the data to host 
memory.  Hence, the fault zone remains the HCA / RNIC and while A may free 
the associated buffer for other usage, it should not rely upon the data 
being delivered to host memory on B.  This is one of the fault scenarios I 
raised during the initial RDS transparent recovery assertions.  If A were 
to issue a RDMA Read to the B targeting the associated RDMA Write memory 
location, then it can know the data has been placed in B's memory.

>  B can't rely on anything unless A uses the RDMA write with
>immediate which puts a completion event in B's CQ.
>Most applications on B ignore this requirement and test for the last
>memory location being modified which usually works but doesn't
>guarantee that all the data is in memory.

B cannot rely on anything until a completion is seen either through an 
immediate or a subsequent Send.  It is not wise to rely upon IHV-specific 
behaviors when designing an application as even an IHV can change things 
over time or due to interoperability requirements, things may not work as 
desired which is definitely a customer complaint that many would like to avoid.

BTW, the reason immediate data is 4 bytes in length is that was what was 
defined in VIA.  Many within the IBTA wanted to get rid of immediate data 
but due to the requirement to support legacy VIA applications, the 
immediate value was left in place.  The need to support a larger value was 
not apparent.  One needs to keep in mind where the immediate resides within 
the wire protocol and its usage model.  The past usage was to signal a PID 
or some other unique identifier that could be used to comprehend which 
thread of execution should be informed of a particular completion 
event.  Four bytes is sufficient to communicate such information without 
significantly complicating or making the wire protocol too inefficient.

Mike  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060828/85d7eb99/attachment.html>