[ewg] [OMPI devel] [BUG?] OpenMPI with openib on SPARC64: Signal: Bus error (10)

Lukas Razik linux at razik.name
Wed Nov 23 17:02:33 PST 2011


TERRY DONTJE <terry.dontje at oracle.com> wrote:
>>>Can you build OMPI as a 32 bit library and see if that works any better?
>>So you mean I shall leave the whole OFED stack as 64 bit and build only openmpi as 32 bit?
>I believe the OFED user libraries will need to be 32 bit also or the 32 bit MPI libraries will not be able to use them.
>
>>How must I configure openmpi that it'll be definitely built as 32bit?
>You need to change the CFLAGS, CXXFLAGS, FFLAGS and FCFLAGS in the configure line such that you replace "-m64" with "-m32" or just "-m32" if "-m64" is not there?


Maybe that's interesting for the OFED guys:
To get OFED's 'install.pl' working with '--build32' on sparc64 I had to add the following lines (marked with +):
...
elsif ($arch eq "ppc64") {
    $target_cpu32 = 'ppc';
}
+elsif ($arch eq "sparc64") {
+    $target_cpu32 = 'sparc';
+}
...
After that the chosen libs from OFED were built as 32 and 64 bit versions.


Hello Terry,

I could build a 32 bit version of
- openmpi-1.4.4
- osu_benchmarks-3.1.1
and link them against the needed 32bit OFED libraries.

But the problem is still the same. But anyway thanks for the good tip to try the 32 Bit version!

That's the error message I get:
# /usr/mpi/gcc/openmpi-1.4.4/bin/mpirun -np 2 -host ib1,ib2 ~/razik/src/OFED-1.5.4-rc4/SRPMS/mpitests-3.2/osu_benchmarks-3.1.1/osu_latency
# OSU MPI Latency Test v3.1.1
# Size            Latency (us)
[cluster1:61532] *** Process received signal ***
[cluster1:61532] Signal: Bus error (10)
[cluster1:61532] Signal code: Invalid address alignment (1)
[cluster1:61532] Failing at address: 0x898a53
[cluster1:61532] [ 0] /usr/mpi/gcc/openmpi-1.4.4/lib/openmpi/mca_pml_ob1.so(+0x50e0) [0xf72090e0]
[cluster1:61532] [ 1] /usr/mpi/gcc/openmpi-1.4.4/lib/openmpi/mca_coll_tuned.so(+0x1750) [0xf6fe9750]
[cluster1:61532] [ 2] /usr/mpi/gcc/openmpi-1.4.4/lib/openmpi/mca_coll_tuned.so(+0x8e5c) [0xf6ff0e5c]
[cluster1:61532] [ 3] /usr/mpi/gcc/openmpi-1.4.4/lib/libmpi.so.0(PMPI_Barrier+0xc0) [0xf77b718c]
[cluster1:61532] [ 4] /root/razik/src/OFED-1.5.4-rc4/SRPMS/mpitests-3.2/osu_benchmarks-3.1.1/osu_latency(main+0x2c8) [0x10cb0]
[cluster1:61532] [ 5] /lib/libc.so.6(__libc_start_main+0x10c) [0xf73e464c]
[cluster1:61532] [ 6] /root/razik/src/OFED-1.5.4-rc4/SRPMS/mpitests-3.2/osu_benchmarks-3.1.1/osu_latency(_start+0x2c) [0x1090c]
[cluster1:61532] *** End of error message ***
[cluster2:07039] *** Process received signal ***
[cluster2:07039] Signal: Bus error (10)
[cluster2:07039] Signal code: Invalid address alignment (1)
[cluster2:07039] Failing at address: 0x898a53
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 61532 on node cluster1 exited on signal 10 (Bus error).
--------------------------------------------------------------------------
[cluster2:07039] [ 0] /usr/mpi/gcc/openmpi-1.4.4/lib/openmpi/mca_pml_ob1.so(+0x50e0) [0xf77750e0]
[cluster2:07039] [ 1] /usr/mpi/gcc/openmpi-1.4.4/lib/openmpi/mca_coll_tuned.so(+0x1750) [0xf7555750]
[cluster2:07039] [ 2] /usr/mpi/gcc/openmpi-1.4.4/lib/openmpi/mca_coll_tuned.so(+0x8e5c) [0xf755ce5c]
[cluster2:07039] [ 3] /usr/mpi/gcc/openmpi-1.4.4/lib/libmpi.so.0(PMPI_Barrier+0xc0) [0xf7d3318c]
[cluster2:07039] [ 4] /root/razik/src/OFED-1.5.4-rc4/SRPMS/mpitests-3.2/osu_benchmarks-3.1.1/osu_latency(main+0x2c8) [0x10cb0]
[cluster2:07039] [ 5] /lib/libc.so.6(__libc_start_main+0x10c) [0xf796464c]
[cluster2:07039] [ 6] /root/razik/src/OFED-1.5.4-rc4/SRPMS/mpitests-3.2/osu_benchmarks-3.1.1/osu_latency(_start+0x2c) [0x1090c]
[cluster2:07039] *** End of error message ***

# ldd /usr/mpi/gcc/openmpi-1.4.4/bin/mpirun
        libopen-rte.so.0 => /usr/mpi/gcc/openmpi-1.4.4/lib/libopen-rte.so.0 (0xf7c18000)
        libopen-pal.so.0 => /usr/mpi/gcc/openmpi-1.4.4/lib/libopen-pal.so.0 (0xf7bbc000)
        libdl.so.2 => /lib/libdl.so.2 (0xf7b90000)
        libnsl.so.1 => /lib/libnsl.so.1 (0xf7b68000)
        libutil.so.1 => /lib/libutil.so.1 (0xf7b54000)
        libm.so.6 => /lib/libm.so.6 (0xf7a70000)
        libpthread.so.0 => /lib/libpthread.so.0 (0xf7a44000)
        libc.so.6 => /lib/libc.so.6 (0xf78c4000)
        /lib/ld-linux.so.2 (0x70000000)
---

Best regards,
Lukas



More information about the ewg mailing list