[ofiwg] the problems with atomics
sayantan.sur at intel.com
Mon Jul 13 13:20:35 PDT 2015
If we do punt the issue to an upper layer, such as MPI, then should we
export the fi_datatype_size() method so that apps can find out what
libfabric was compiled with?
On 7/8/15, 6:10 AM, "ofiwg-bounces at lists.openfabrics.org on behalf of
Underwood, Keith D" <ofiwg-bounces at lists.openfabrics.org on behalf of
keith.d.underwood at intel.com> wrote:
>Technically, MPI implementations are supposed to detect the heterogeneity
>and deal with it (e.g. for the endianness problem).
>For LONG_DOUBLE, MPI essentially makes the assumption that the app and
>MPI are compiled with "compatible" compilers and options regarding
>LONG_DOUBLE formats. Then again, the same is kind of true for the
>definition of things like MPI_INT and MPI_LONG_INT, right? Have we all
>forgotten the pain of the transition to 64 bit?
>When we tackled this problem for Portals, the ints were easy: we assumed
>endianness matched (we didn't care about heterogeneous systems). We
>assumed that MPI/SHMEM/whatever would have to figure out what "int" meant
>and could cast to either int32_t or int64_t, as appropriate for the
>compiler. Floats and doubles were obvious (only one sizes of those).
>LONG_DOUBLE was... intractable... so we punted and said that the Portals
>library and MPI library would have to be compiled with compatible
>compilers and compiler options if anybody used that. It wasn't a great
>solution, but the list Sean has below was unpleasant, and LONG_DOUBLE use
>is, um, rare.
>> -----Original Message-----
>> From: ofiwg-bounces at lists.openfabrics.org [mailto:ofiwg-
>> bounces at lists.openfabrics.org] On Behalf Of Hefty, Sean
>> Sent: Tuesday, July 07, 2015 5:46 PM
>> To: Jason Gunthorpe
>> Cc: ofiwg at lists.openfabrics.org
>> Subject: Re: [ofiwg] the problems with atomics
>> > > FI_LONG_DOUBLE_80BITS_ALIGNED_96,
>> > > FI_LONG_DOUBLE_80BITS_ALIGNED_128
>> > > FI_LONG_DOUBLE_96BITS_ALIGNED_96,
>> > > FI_LONG_DOUBLE_96BITS_ALIGNED_128,
>> > > FI_LONG_DOUBLE_128BITS_ALIGNED_128,
>> > > FI_LONG_DOUBLE_WEIRD_PPC_ALIGNED_128
>> > You expect app writers to provide this value? That sounds very hard.
>> Long double seems ridiculous. But from what I can tell, 2 apps on the
>> system could in theory have different alignments based on how they were
>> compiled. I don't want what I listed above as a solution at all, but I
>> something. At the moment, what I need to fix is the lack of any
>> definition for 'FI_LONG_DOUBLE', which was added under the assumption
>> that it had a sane IEEE definition...
>> Maybe the best option is to just define the equivalent of:
>> And map DOUBLE -> FLOAT64, and LONG_DOUBLE -> FLOAT128
>> Figuring out if FLOAT96 is implemented using 80 bits would be left as an
>> exercise to the app.
>> > There is no implicit conversion when doing a RDMA READ/WRITE or SEND,
>> > so the app is going to be exposed to all of the differences if it
>> > wants to run homogeneously.
>> RDMA and sends only deal with bytes. They don't attempt to interpret
>> at the remote side.
>> > I'd probably also say that the integer atomic types should have the
>> > same endianness as the local machine (why should ATOMIC X vs RDMA
>> > return different answers?)
>> Even with RDMA, an app should care about endianness. E.g. rsockets
>> exchanges endianness format as part of its protocol, so that RDMA writes
>> end up in a format that the remote side can read. I'm guessing most
>> ignore this.
>> > Generally speaking, changing the alignment and size breaks all memory
>> > layouts, I'd be amazed if a MPI or PGAS app could run across hardware
>> > with different FPU and different integer representations.
>> I agree, but that doesn't mean that I think the libfabric API should be
>> Apps should be able to run between different CPU architectures, though
>> will come with a cost.
>> - Sean
>> ofiwg mailing list
>> ofiwg at lists.openfabrics.org
>ofiwg mailing list
>ofiwg at lists.openfabrics.org
More information about the ofiwg