[ofiwg] the problems with atomics

Tue Jul 7 16:01:40 PDT 2015

On Tue, Jul 07, 2015 at 09:45:52PM +0000, Hefty, Sean wrote:

> > There is no implicit conversion when doing a RDMA READ/WRITE or SEND,
> > so the app is going to be exposed to all of the differences if it
> > wants to run homogeneously.
> 
> RDMA and sends only deal with bytes.  They don't attempt to
> interpret data at the remote side.

As far as I see it, there are only three reasonable choices for
atomics:
 1) They work on data in remote memory in a highly specific
    format (ie 32 bit unsigned integer in little endian)
 2) They work on data in remote memory the same way as the local app
    on local memory
 3) The work on data in remote memory in the same way as the remote
    app on it's local memory

#1 is the basic building block of the other two options.

#2 can be done automatically by having libfabric detect the type set
of the local app and map things like FI_UINT64 to FI_LITTLE_UINT64
(for instance)

#3 can be done by the app by exchanging information outside of
libfabric, eg a specification may say that memory X is layed out using
64 bit big endian integers, when you RDMA READ/WRITE or ATOMIC it then
you always use BE. Or, eg, the app exchanges info about the peer at
run time.

Then on top of that, you have to answer the question: what is placed
in the local buffer.

#1 would have definitions like
 FI_LITTLE_UINT64
 FI_BIG_UINT64
 FI_LITTLE_FLOAT32_IEE754
 FI_LONG_DOUBLE_80BITS_ALIGNED_128
[Be aware, there is an insane amount of variation for floating point
 binary representation, ARM has some exciting wackiness even for 32
 bits as well]

#2 would have definitions like
 FI_UNSIGNED_LONG
 FI_UNSIGNED_LONG_LONG
 FI_UINT64
 FI_FLOAT
 FI_LONG_DOUBLE

Apps are simple, when you use an atomic and want #2 semantics, you use
the C type name.

You may need funky macros:

 #define FI_LONG_DOUBLE (sizeof(long double) == XX?FI_LONG_DOUBLE_80BITS_ALIGNED_128:FI_LONG_DOUBLE_80BITS_ALIGNED_96)

To cover off compiler flag variations.

#3 is harder because you also want to implicitly do type conversions
along the way, eg: the remote may be FI_LITTLE_UINT64, but the local
may be FI_BIG_UINT64. So, an app would naturally want to say:

  FI_ATOMIC_READ(remote=FI_LITTLE_UINT64,local=FI_BIG_UINT64)

And have the two NICs do the swapping in hardware.

Take the same idea and extend it to floats,

#3 becomes:
  FI_ATOMIC_READ(remote=FI_LONG_DOUBLE_WEIRD_PPC_ALIGNED_128,local=FI_LONG_DOUBLE)

#2 becomes:
  long double buf;
  FI_ATOMIC_READ(FI_LONG_DOUBLE,&buf);

#1 is fairly unusable, but becomes:
  ??? buf
  FI_ATOMIC_READ(FI_LONG_DOUBLE_WEIRD_PPC_ALIGNED_128,&buf);
  long double usable = convert_ppc_to_local(&buf);

And now you have a sane self consistent API.

Jason