[ewg] Seg fault running OpenMPI-1.3.1rc4
Steve Wise
swise at opengridcomputing.com
Sun Mar 29 11:00:41 PDT 2009
Hey Jeff,
Have you seen this? I'm hitting this regularly running on ofed-1.4.1-rc2.
Test:
[ompi at vic12 ~]$ cat doit-ompi
#!/bin/sh
while : ; do
mpirun -np 16 --host vic12-10g,vic20-10g,vic9-10g,vic21-10g
--mca btl openib,self,sm --mca btl_openib_max_btls 1
/usr/mpi/gcc/openmpi-1.3.1rc4/tests/IMB-3.1/IMB-MPI1 -npmin 16 bcast
scatter sendrecv exchange </dev/null
done
Seg Fault output:
[vic21:04047] *** Process received signal ***
[vic21:04047] Signal: Segmentation fault (11)
[vic21:04047] Signal code: Address not mapped (1)
[vic21:04047] Failing at address: 0x18
[vic21:04047] [ 0] /lib64/libpthread.so.0 [0x3dde20e4c0]
[vic21:04047] [ 1]
/usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so
[0x2b911bc33800]
[vic21:04047] [ 2]
/usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so
[0x2b911bc38c2d]
[vic21:04047] [ 3]
/usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so
[0x2b911bc33fcb]
[vic21:04047] [ 4]
/usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so
[0x2b911bc22af8]
[vic21:04047] [ 5]
/usr/mpi/gcc/openmpi-1.3.1rc4/lib64/libopen-pal.so.0(mca_base_components_close+0x83)
[0x2b911933da33]
[vic21:04047] [ 6]
/usr/mpi/gcc/openmpi-1.3.1rc4/lib64/libmpi.so.0(mca_btl_base_close+0xe0)
[0x2b9118ea3fb0]
[vic21:04047] [ 7]
/usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_bml_r2.so [0x2b911ba1938f]
[vic21:04047] [ 8]
/usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_pml_ob1.so [0x2b911b601cde]
[vic21:04047] [ 9] /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/libmpi.so.0
[0x2b9118e7241b]
[vic21:04047] [10]
/usr/mpi/gcc/openmpi-1.3.1rc4/tests/IMB-3.1/IMB-MPI1(main+0x178) [0x403498]
[vic21:04047] [11] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3ddd61d974]
[vic21:04047] [12] /usr/mpi/gcc/openmpi-1.3.1rc4/tests/IMB-3.1/IMB-MPI1
[0x403269]
[vic21:04047] *** End of error message ***
More information about the ewg
mailing list