***SPAM*** Re: [OMPI devel] [ewg] Seg fault running OpenMPI-1.3.1rc4

Jeff Squyres jsquyres at cisco.com
Mon Mar 30 08:42:28 PDT 2009


Can you send a gdb bt from a corresponding corefile?  That would help  
immensely...

The stack trace we get from glibc unfortunately does not show file/ 
line numbers.


On Mar 30, 2009, at 10:56 AM, Steve Wise wrote:

>
> Hey Pasha,
>
>
> I just applied r20872 and retested, and I still hit this seg fault.   
> So
> I think this is a new bug.
>
> Lemme pull the trunk and try that.
>
>
>
> Pavel Shamis (Pasha) wrote:
> > I think you problem is related to this bug:
> > https://svn.open-mpi.org/trac/ompi/ticket/1823
> >
> > And it is resolved on the ompi-trunk.
> >
> > Pasha.
> >
> > Steve Wise wrote:
> >> When this happens, that node logs this type of message also in
> >> /var/log/messages:
> >>
> >> IMB-MPI1[8859]: segfault at 0000000000000018 rip 00002b7bfc880800  
> rsp
> >> 00007fffb1021330 error 4
> >>
> >> Steve Wise wrote:
> >>> Hey Jeff,
> >>>
> >>> Have you seen this?  I'm hitting this regularly running on
> >>> ofed-1.4.1-rc2.
> >>>
> >>> Test:
> >>> [ompi at vic12 ~]$ cat doit-ompi
> >>> #!/bin/sh
> >>> while : ; do
> >>>        mpirun -np 16 --host vic12-10g,vic20-10g,vic9-10g,vic21-10g
> >>> --mca btl openib,self,sm  --mca btl_openib_max_btls 1
> >>> /usr/mpi/gcc/openmpi-1.3.1rc4/tests/IMB-3.1/IMB-MPI1 -npmin 16
> >>> bcast scatter sendrecv exchange </dev/null
> >>> done
> >>>
> >>>
> >>> Seg Fault output:
> >>>
> >>> [vic21:04047] *** Process received signal ***
> >>> [vic21:04047] Signal: Segmentation fault (11)
> >>> [vic21:04047] Signal code: Address not mapped (1)
> >>> [vic21:04047] Failing at address: 0x18
> >>> [vic21:04047] [ 0] /lib64/libpthread.so.0 [0x3dde20e4c0]
> >>> [vic21:04047] [ 1]
> >>> /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so
> >>> [0x2b911bc33800]
> >>> [vic21:04047] [ 2]
> >>> /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so
> >>> [0x2b911bc38c2d]
> >>> [vic21:04047] [ 3]
> >>> /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so
> >>> [0x2b911bc33fcb]
> >>> [vic21:04047] [ 4]
> >>> /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so
> >>> [0x2b911bc22af8]
> >>> [vic21:04047] [ 5]
> >>> /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/libopen-pal.so. 
> 0(mca_base_components_close+0x83)
> >>> [0x2b911933da33]
> >>> [vic21:04047] [ 6]
> >>> /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/libmpi.so. 
> 0(mca_btl_base_close+0xe0)
> >>> [0x2b9118ea3fb0]
> >>> [vic21:04047] [ 7]
> >>> /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_bml_r2.so
> >>> [0x2b911ba1938f]
> >>> [vic21:04047] [ 8]
> >>> /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_pml_ob1.so
> >>> [0x2b911b601cde]
> >>> [vic21:04047] [ 9] /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/libmpi.so.0
> >>> [0x2b9118e7241b]
> >>> [vic21:04047] [10]
> >>> /usr/mpi/gcc/openmpi-1.3.1rc4/tests/IMB-3.1/IMB-MPI1(main+0x178)
> >>> [0x403498]
> >>> [vic21:04047] [11] /lib64/libc.so.6(__libc_start_main+0xf4)
> >>> [0x3ddd61d974]
> >>> [vic21:04047] [12]
> >>> /usr/mpi/gcc/openmpi-1.3.1rc4/tests/IMB-3.1/IMB-MPI1 [0x403269]
> >>> [vic21:04047] *** End of error message ***
> >>>
> >>> _______________________________________________
> >>> ewg mailing list
> >>> ewg at lists.openfabrics.org
> >>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
> >>
> >> _______________________________________________
> >> devel mailing list
> >> devel at open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>
> >
> > _______________________________________________
> > ewg mailing list
> > ewg at lists.openfabrics.org
> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>


-- 
Jeff Squyres
Cisco Systems




More information about the ewg mailing list