[ewg] Seg fault running OpenMPI-1.3.1rc4

Steve Wise swise at opengridcomputing.com
Sun Mar 29 11:03:00 PDT 2009


When this happens, that node logs this type of message also in 
/var/log/messages:

IMB-MPI1[8859]: segfault at 0000000000000018 rip 00002b7bfc880800 rsp 
00007fffb1021330 error 4

Steve Wise wrote:
> Hey Jeff,
>
> Have you seen this?  I'm hitting this regularly running on 
> ofed-1.4.1-rc2.
>
> Test:
> [ompi at vic12 ~]$ cat doit-ompi
> #!/bin/sh
> while : ; do
>        mpirun -np 16 --host vic12-10g,vic20-10g,vic9-10g,vic21-10g 
> --mca btl openib,self,sm  --mca btl_openib_max_btls 1 
> /usr/mpi/gcc/openmpi-1.3.1rc4/tests/IMB-3.1/IMB-MPI1 -npmin 16  bcast 
> scatter sendrecv exchange </dev/null
> done
>
>
> Seg Fault output:
>
> [vic21:04047] *** Process received signal ***
> [vic21:04047] Signal: Segmentation fault (11)
> [vic21:04047] Signal code: Address not mapped (1)
> [vic21:04047] Failing at address: 0x18
> [vic21:04047] [ 0] /lib64/libpthread.so.0 [0x3dde20e4c0]
> [vic21:04047] [ 1] 
> /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so 
> [0x2b911bc33800]
> [vic21:04047] [ 2] 
> /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so 
> [0x2b911bc38c2d]
> [vic21:04047] [ 3] 
> /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so 
> [0x2b911bc33fcb]
> [vic21:04047] [ 4] 
> /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so 
> [0x2b911bc22af8]
> [vic21:04047] [ 5] 
> /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/libopen-pal.so.0(mca_base_components_close+0x83) 
> [0x2b911933da33]
> [vic21:04047] [ 6] 
> /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/libmpi.so.0(mca_btl_base_close+0xe0) 
> [0x2b9118ea3fb0]
> [vic21:04047] [ 7] 
> /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_bml_r2.so 
> [0x2b911ba1938f]
> [vic21:04047] [ 8] 
> /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_pml_ob1.so 
> [0x2b911b601cde]
> [vic21:04047] [ 9] /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/libmpi.so.0 
> [0x2b9118e7241b]
> [vic21:04047] [10] 
> /usr/mpi/gcc/openmpi-1.3.1rc4/tests/IMB-3.1/IMB-MPI1(main+0x178) 
> [0x403498]
> [vic21:04047] [11] /lib64/libc.so.6(__libc_start_main+0xf4) 
> [0x3ddd61d974]
> [vic21:04047] [12] 
> /usr/mpi/gcc/openmpi-1.3.1rc4/tests/IMB-3.1/IMB-MPI1 [0x403269]
> [vic21:04047] *** End of error message ***
>
> _______________________________________________
> ewg mailing list
> ewg at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg




More information about the ewg mailing list