[ofiwg] the problems with atomics
Jason Gunthorpe
jgunthorpe at obsidianresearch.com
Tue Jul 7 16:01:40 PDT 2015
On Tue, Jul 07, 2015 at 09:45:52PM +0000, Hefty, Sean wrote:
> > There is no implicit conversion when doing a RDMA READ/WRITE or SEND,
> > so the app is going to be exposed to all of the differences if it
> > wants to run homogeneously.
>
> RDMA and sends only deal with bytes. They don't attempt to
> interpret data at the remote side.
As far as I see it, there are only three reasonable choices for
atomics:
1) They work on data in remote memory in a highly specific
format (ie 32 bit unsigned integer in little endian)
2) They work on data in remote memory the same way as the local app
on local memory
3) The work on data in remote memory in the same way as the remote
app on it's local memory
#1 is the basic building block of the other two options.
#2 can be done automatically by having libfabric detect the type set
of the local app and map things like FI_UINT64 to FI_LITTLE_UINT64
(for instance)
#3 can be done by the app by exchanging information outside of
libfabric, eg a specification may say that memory X is layed out using
64 bit big endian integers, when you RDMA READ/WRITE or ATOMIC it then
you always use BE. Or, eg, the app exchanges info about the peer at
run time.
Then on top of that, you have to answer the question: what is placed
in the local buffer.
#1 would have definitions like
FI_LITTLE_UINT64
FI_BIG_UINT64
FI_LITTLE_FLOAT32_IEE754
FI_LONG_DOUBLE_80BITS_ALIGNED_128
[Be aware, there is an insane amount of variation for floating point
binary representation, ARM has some exciting wackiness even for 32
bits as well]
#2 would have definitions like
FI_UNSIGNED_LONG
FI_UNSIGNED_LONG_LONG
FI_UINT64
FI_FLOAT
FI_LONG_DOUBLE
Apps are simple, when you use an atomic and want #2 semantics, you use
the C type name.
You may need funky macros:
#define FI_LONG_DOUBLE (sizeof(long double) == XX?FI_LONG_DOUBLE_80BITS_ALIGNED_128:FI_LONG_DOUBLE_80BITS_ALIGNED_96)
To cover off compiler flag variations.
#3 is harder because you also want to implicitly do type conversions
along the way, eg: the remote may be FI_LITTLE_UINT64, but the local
may be FI_BIG_UINT64. So, an app would naturally want to say:
FI_ATOMIC_READ(remote=FI_LITTLE_UINT64,local=FI_BIG_UINT64)
And have the two NICs do the swapping in hardware.
Take the same idea and extend it to floats,
#3 becomes:
FI_ATOMIC_READ(remote=FI_LONG_DOUBLE_WEIRD_PPC_ALIGNED_128,local=FI_LONG_DOUBLE)
#2 becomes:
long double buf;
FI_ATOMIC_READ(FI_LONG_DOUBLE,&buf);
#1 is fairly unusable, but becomes:
??? buf
FI_ATOMIC_READ(FI_LONG_DOUBLE_WEIRD_PPC_ALIGNED_128,&buf);
long double usable = convert_ppc_to_local(&buf);
And now you have a sane self consistent API.
Jason
More information about the ofiwg
mailing list