[ofiwg] the problems with atomics
Hefty, Sean
sean.hefty at intel.com
Tue Jul 7 16:06:22 PDT 2015
> As far as I see it, there are only three reasonable choices for
> atomics:
> 1) They work on data in remote memory in a highly specific
> format (ie 32 bit unsigned integer in little endian)
> 2) They work on data in remote memory the same way as the local app
> on local memory
> 3) The work on data in remote memory in the same way as the remote
> app on it's local memory
>
> #1 is the basic building block of the other two options.
>
> #2 can be done automatically by having libfabric detect the type set
> of the local app and map things like FI_UINT64 to FI_LITTLE_UINT64
> (for instance)
>
> #3 can be done by the app by exchanging information outside of
> libfabric, eg a specification may say that memory X is layed out using
> 64 bit big endian integers, when you RDMA READ/WRITE or ATOMIC it then
> you always use BE. Or, eg, the app exchanges info about the peer at
> run time.
>
> Then on top of that, you have to answer the question: what is placed
> in the local buffer.
>
> #1 would have definitions like
> FI_LITTLE_UINT64
> FI_BIG_UINT64
> FI_LITTLE_FLOAT32_IEE754
> FI_LONG_DOUBLE_80BITS_ALIGNED_128
> [Be aware, there is an insane amount of variation for floating point
> binary representation, ARM has some exciting wackiness even for 32
> bits as well]
>
> #2 would have definitions like
> FI_UNSIGNED_LONG
> FI_UNSIGNED_LONG_LONG
> FI_UINT64
> FI_FLOAT
> FI_LONG_DOUBLE
>
> Apps are simple, when you use an atomic and want #2 semantics, you use
> the C type name.
>
> You may need funky macros:
>
> #define FI_LONG_DOUBLE (sizeof(long double) ==
> XX?FI_LONG_DOUBLE_80BITS_ALIGNED_128:FI_LONG_DOUBLE_80BITS_ALIGNED_96)
>
> To cover off compiler flag variations.
>
> #3 is harder because you also want to implicitly do type conversions
> along the way, eg: the remote may be FI_LITTLE_UINT64, but the local
> may be FI_BIG_UINT64. So, an app would naturally want to say:
>
> FI_ATOMIC_READ(remote=FI_LITTLE_UINT64,local=FI_BIG_UINT64)
>
> And have the two NICs do the swapping in hardware.
>
> Take the same idea and extend it to floats,
>
> #3 becomes:
>
> FI_ATOMIC_READ(remote=FI_LONG_DOUBLE_WEIRD_PPC_ALIGNED_128,local=FI_LONG_D
> OUBLE)
>
> #2 becomes:
> long double buf;
> FI_ATOMIC_READ(FI_LONG_DOUBLE,&buf);
>
> #1 is fairly unusable, but becomes:
> ??? buf
> FI_ATOMIC_READ(FI_LONG_DOUBLE_WEIRD_PPC_ALIGNED_128,&buf);
> long double usable = convert_ppc_to_local(&buf);
>
> And now you have a sane self consistent API.
Sane? :)
More information about the ofiwg
mailing list