[ofiwg] the problems with atomics
Jason Gunthorpe
jgunthorpe at obsidianresearch.com
Tue Jul 7 13:27:49 PDT 2015
On Tue, Jul 07, 2015 at 07:41:20PM +0000, Hefty, Sean wrote:
> Someone feel free to chime in here with a brilliant solution for
> handling this that doesn't and up with this sort of datatype
> explosion:
>
> FI_LONG_DOUBLE_80BITS_ALIGNED_96,
> FI_LONG_DOUBLE_80BITS_ALIGNED_128
> FI_LONG_DOUBLE_96BITS_ALIGNED_96,
> FI_LONG_DOUBLE_96BITS_ALIGNED_128,
> FI_LONG_DOUBLE_128BITS_ALIGNED_128,
> FI_LONG_DOUBLE_WEIRD_PPC_ALIGNED_128
You expect app writers to provide this value? That sounds very hard.
I would say, 'FI_LONG_DOUBLE' means whatever it means for the local
machine, and libfabric doesn't do implicit conversions of floating
point types while executing atomics.
There is no implicit conversion when doing a RDMA READ/WRITE or SEND,
so the app is going to be exposed to all of the differences if it
wants to run homogeneously.
I'd probably also say that the integer atomic types should have the
same endianness as the local machine (why should ATOMIC X vs RDMA READ
return different answers?)
Generally speaking, changing the alignment and size breaks all memory
layouts, I'd be amazed if a MPI or PGAS app could run across hardware
with different FPU and different integer representations.
Jason
More information about the ofiwg
mailing list