[ofa-general] uDAPL DTO completion question.
Caitlin Bestler
caitlin.bestler at gmail.com
Fri May 1 09:08:13 PDT 2009
On Fri, May 1, 2009 at 4:24 AM, arkady kanevsky
<arkady.kanevsky at gmail.com> wrote:
> Jie,
> it sounds to me that either the variable is not volatile or compiler
> optimization
> causes some problem. I would check for these first.
> Arkady
>
Agreed, it is definitely a caching issue.
Atomics are InfiniBand specific, and there are some fairly complex
rules that govern
how much the HCA can do caching. The gotcha is that they basically provide some
cache coherency guarantees within the context of a connection, but not
much between
connections or versus local applications.
That said, it would be rare for HCA caching to be the cause of
anything worse than
some unexpected ordering. Adapters cache when they have to, but would
really rather
not allocate or track a lot of resources. Updating to real physical
memory ASAP is much
simpler.
Compilers, on the other hand, *love* optimizing. The key thing to
understand is that the
HCA is another processor, one that is at least as distant as any other
CPU core. Any
and all techniques used when sharing memory with another processor apply.
Completions hide all that from the application, just promising that
specific things are
coherent when the user invokes the verbs to reap a completion. So
whenever you do
without completions you are dealing with an arbitrary multi-processor
memory coherence
problem.
More information about the general
mailing list