***SPAM*** Re: [OMPI devel] [ewg] Seg faultrunning OpenMPI-1.3.1rc4

Jeff Squyres jsquyres at cisco.com
Mon Mar 30 13:32:05 PDT 2009


Fixed in https://svn.open-mpi.org/trac/ompi/changeset/20896; thanks  
Steve.

On Mar 30, 2009, at 12:45 PM, Steve Wise wrote:

> Hey Pasha,
>
> I'm tracking this with OFA bug 1579:
>
> https://bugs.openfabrics.org/show_bug.cgi?id=1579
>
> I think Jeff Squyres is digging into it.
>
> But the core and IMB-MPI1 files are here:
>
> http://www.openfabrics.org/~swise/bug1579/core.8175.gz <http://www.openfabrics.org/%7Eswise/bug1579/core.8175.gz 
> >
> http://www.openfabrics.org/~swise/bug1579/IMB-MPI1.gz <http://www.openfabrics.org/%7Eswise/bug1579/IMB-MPI1.gz 
> >
>
> I built the OpenMPI-1.3.1-1 src rpm thusly:
>
> rpmbuild -bb openmpi-1.3.1.spec --define='ofed 1'  -- 
> define='install_in_opt 1'
> --define='configure_options CFLAGS=-g'
>
>
> Steve.
>
>
>
> Pavel Shamis (Pasha) wrote:
> > Steve,
> > If you will compile OMPI code with CFLAGS="-g" ,generate segfault
> > core_file and send the core + IMB-MPI1 to me I will be able to
> > understand the problem better.
> >
> > Regards,
> > Pasha
> >
> > Steve Wise wrote:
> >>
> >> Hey Pasha,
> >>
> >>
> >> I just applied r20872 and retested, and I still hit this seg fault.
> >> So I think this is a new bug.
> >>
> >> Lemme pull the trunk and try that.
> >>
> >>
> >>
> >> Pavel Shamis (Pasha) wrote:
> >>> I think you problem is related to this bug:
> >>> https://svn.open-mpi.org/trac/ompi/ticket/1823
> >>>
> >>> And it is resolved on the ompi-trunk.
> >>>
> >>> Pasha.
> >>>
> >>> Steve Wise wrote:
> >>>> When this happens, that node logs this type of message also in
> >>>> /var/log/messages:
> >>>>
> >>>> IMB-MPI1[8859]: segfault at 0000000000000018 rip 00002b7bfc880800
> >>>> rsp 00007fffb1021330 error 4
> >>>>
> >>>> Steve Wise wrote:
> >>>>> Hey Jeff,
> >>>>>
> >>>>> Have you seen this?  I'm hitting this regularly running on
> >>>>> ofed-1.4.1-rc2.
> >>>>>
> >>>>> Test:
> >>>>> [ompi at vic12 ~]$ cat doit-ompi
> >>>>> #!/bin/sh
> >>>>> while : ; do
> >>>>>        mpirun -np 16 --host  
> vic12-10g,vic20-10g,vic9-10g,vic21-10g
> >>>>> --mca btl openib,self,sm  --mca btl_openib_max_btls 1
> >>>>> /usr/mpi/gcc/openmpi-1.3.1rc4/tests/IMB-3.1/IMB-MPI1 -npmin 16
> >>>>> bcast scatter sendrecv exchange </dev/null
> >>>>> done
> >>>>>
> >>>>>
> >>>>> Seg Fault output:
> >>>>>
> >>>>> [vic21:04047] *** Process received signal ***
> >>>>> [vic21:04047] Signal: Segmentation fault (11)
> >>>>> [vic21:04047] Signal code: Address not mapped (1)
> >>>>> [vic21:04047] Failing at address: 0x18
> >>>>> [vic21:04047] [ 0] /lib64/libpthread.so.0 [0x3dde20e4c0]
> >>>>> [vic21:04047] [ 1]
> >>>>> /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so
> >>>>> [0x2b911bc33800]
> >>>>> [vic21:04047] [ 2]
> >>>>> /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so
> >>>>> [0x2b911bc38c2d]
> >>>>> [vic21:04047] [ 3]
> >>>>> /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so
> >>>>> [0x2b911bc33fcb]
> >>>>> [vic21:04047] [ 4]
> >>>>> /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so
> >>>>> [0x2b911bc22af8]
> >>>>> [vic21:04047] [ 5]
> >>>>> /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/libopen-pal.so. 
> 0(mca_base_components_close+0x83)
> >>>>> [0x2b911933da33]
> >>>>> [vic21:04047] [ 6]
> >>>>> /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/libmpi.so. 
> 0(mca_btl_base_close+0xe0)
> >>>>> [0x2b9118ea3fb0]
> >>>>> [vic21:04047] [ 7]
> >>>>> /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_bml_r2.so
> >>>>> [0x2b911ba1938f]
> >>>>> [vic21:04047] [ 8]
> >>>>> /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_pml_ob1.so
> >>>>> [0x2b911b601cde]
> >>>>> [vic21:04047] [ 9] /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/ 
> libmpi.so.0
> >>>>> [0x2b9118e7241b]
> >>>>> [vic21:04047] [10]
> >>>>> /usr/mpi/gcc/openmpi-1.3.1rc4/tests/IMB-3.1/IMB-MPI1(main+0x178)
> >>>>> [0x403498]
> >>>>> [vic21:04047] [11] /lib64/libc.so.6(__libc_start_main+0xf4)
> >>>>> [0x3ddd61d974]
> >>>>> [vic21:04047] [12]
> >>>>> /usr/mpi/gcc/openmpi-1.3.1rc4/tests/IMB-3.1/IMB-MPI1 [0x403269]
> >>>>> [vic21:04047] *** End of error message ***
> >>>>>
> >>>>> _______________________________________________
> >>>>> ewg mailing list
> >>>>> ewg at lists.openfabrics.org
> >>>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
> >>>>
> >>>> _______________________________________________
> >>>> devel mailing list
> >>>> devel at open-mpi.org
> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>>>
> >>>
> >>> _______________________________________________
> >>> ewg mailing list
> >>> ewg at lists.openfabrics.org
> >>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
> >>
> >> _______________________________________________
> >> ewg mailing list
> >> ewg at lists.openfabrics.org
> >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
> >>
>
> _______________________________________________
> ewg mailing list
> ewg at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


-- 
Jeff Squyres
Cisco Systems




More information about the ewg mailing list