[ofiwg] the problems with atomics

Jason Gunthorpe jgunthorpe at obsidianresearch.com
Tue Jul 7 13:27:49 PDT 2015


On Tue, Jul 07, 2015 at 07:41:20PM +0000, Hefty, Sean wrote:


> Someone feel free to chime in here with a brilliant solution for
> handling this that doesn't and up with this sort of datatype
> explosion:
> 
> FI_LONG_DOUBLE_80BITS_ALIGNED_96,
> FI_LONG_DOUBLE_80BITS_ALIGNED_128
> FI_LONG_DOUBLE_96BITS_ALIGNED_96,
> FI_LONG_DOUBLE_96BITS_ALIGNED_128,
> FI_LONG_DOUBLE_128BITS_ALIGNED_128,
> FI_LONG_DOUBLE_WEIRD_PPC_ALIGNED_128

You expect app writers to provide this value? That sounds very hard.

I would say, 'FI_LONG_DOUBLE' means whatever it means for the local
machine, and libfabric doesn't do implicit conversions of floating
point types while executing atomics.

There is no implicit conversion when doing a RDMA READ/WRITE or SEND,
so the app is going to be exposed to all of the differences if it
wants to run homogeneously.

I'd probably also say that the integer atomic types should have the
same endianness as the local machine (why should ATOMIC X vs RDMA READ
return different answers?)

Generally speaking, changing the alignment and size breaks all memory
layouts, I'd be amazed if a MPI or PGAS app could run across hardware
with different FPU and different integer representations.

Jason



More information about the ofiwg mailing list