***SPAM*** Re: [OMPI devel] [ewg] Seg fault running OpenMPI-1.3.1rc4

Steve Wise swise at opengridcomputing.com
Mon Mar 30 09:45:49 PDT 2009


Hey Pasha,

I'm tracking this with OFA bug 1579:

https://bugs.openfabrics.org/show_bug.cgi?id=1579

I think Jeff Squyres is digging into it. 

But the core and IMB-MPI1 files are here:

http://www.openfabrics.org/~swise/bug1579/core.8175.gz <http://www.openfabrics.org/%7Eswise/bug1579/core.8175.gz>
http://www.openfabrics.org/~swise/bug1579/IMB-MPI1.gz <http://www.openfabrics.org/%7Eswise/bug1579/IMB-MPI1.gz>

I built the OpenMPI-1.3.1-1 src rpm thusly:

rpmbuild -bb openmpi-1.3.1.spec --define='ofed 1'  --define='install_in_opt 1'
--define='configure_options CFLAGS=-g'


Steve.



Pavel Shamis (Pasha) wrote:
> Steve,
> If you will compile OMPI code with CFLAGS="-g" ,generate segfault 
> core_file and send the core + IMB-MPI1 to me I will be able to 
> understand the problem better.
>
> Regards,
> Pasha
>
> Steve Wise wrote:
>>
>> Hey Pasha,
>>
>>
>> I just applied r20872 and retested, and I still hit this seg fault.  
>> So I think this is a new bug.
>>
>> Lemme pull the trunk and try that.
>>
>>
>>
>> Pavel Shamis (Pasha) wrote:
>>> I think you problem is related to this bug: 
>>> https://svn.open-mpi.org/trac/ompi/ticket/1823
>>>
>>> And it is resolved on the ompi-trunk.
>>>
>>> Pasha.
>>>
>>> Steve Wise wrote:
>>>> When this happens, that node logs this type of message also in 
>>>> /var/log/messages:
>>>>
>>>> IMB-MPI1[8859]: segfault at 0000000000000018 rip 00002b7bfc880800 
>>>> rsp 00007fffb1021330 error 4
>>>>
>>>> Steve Wise wrote:
>>>>> Hey Jeff,
>>>>>
>>>>> Have you seen this?  I'm hitting this regularly running on 
>>>>> ofed-1.4.1-rc2.
>>>>>
>>>>> Test:
>>>>> [ompi at vic12 ~]$ cat doit-ompi
>>>>> #!/bin/sh
>>>>> while : ; do
>>>>>        mpirun -np 16 --host vic12-10g,vic20-10g,vic9-10g,vic21-10g 
>>>>> --mca btl openib,self,sm  --mca btl_openib_max_btls 1 
>>>>> /usr/mpi/gcc/openmpi-1.3.1rc4/tests/IMB-3.1/IMB-MPI1 -npmin 16  
>>>>> bcast scatter sendrecv exchange </dev/null
>>>>> done
>>>>>
>>>>>
>>>>> Seg Fault output:
>>>>>
>>>>> [vic21:04047] *** Process received signal ***
>>>>> [vic21:04047] Signal: Segmentation fault (11)
>>>>> [vic21:04047] Signal code: Address not mapped (1)
>>>>> [vic21:04047] Failing at address: 0x18
>>>>> [vic21:04047] [ 0] /lib64/libpthread.so.0 [0x3dde20e4c0]
>>>>> [vic21:04047] [ 1] 
>>>>> /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so 
>>>>> [0x2b911bc33800]
>>>>> [vic21:04047] [ 2] 
>>>>> /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so 
>>>>> [0x2b911bc38c2d]
>>>>> [vic21:04047] [ 3] 
>>>>> /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so 
>>>>> [0x2b911bc33fcb]
>>>>> [vic21:04047] [ 4] 
>>>>> /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so 
>>>>> [0x2b911bc22af8]
>>>>> [vic21:04047] [ 5] 
>>>>> /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/libopen-pal.so.0(mca_base_components_close+0x83) 
>>>>> [0x2b911933da33]
>>>>> [vic21:04047] [ 6] 
>>>>> /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/libmpi.so.0(mca_btl_base_close+0xe0) 
>>>>> [0x2b9118ea3fb0]
>>>>> [vic21:04047] [ 7] 
>>>>> /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_bml_r2.so 
>>>>> [0x2b911ba1938f]
>>>>> [vic21:04047] [ 8] 
>>>>> /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_pml_ob1.so 
>>>>> [0x2b911b601cde]
>>>>> [vic21:04047] [ 9] /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/libmpi.so.0 
>>>>> [0x2b9118e7241b]
>>>>> [vic21:04047] [10] 
>>>>> /usr/mpi/gcc/openmpi-1.3.1rc4/tests/IMB-3.1/IMB-MPI1(main+0x178) 
>>>>> [0x403498]
>>>>> [vic21:04047] [11] /lib64/libc.so.6(__libc_start_main+0xf4) 
>>>>> [0x3ddd61d974]
>>>>> [vic21:04047] [12] 
>>>>> /usr/mpi/gcc/openmpi-1.3.1rc4/tests/IMB-3.1/IMB-MPI1 [0x403269]
>>>>> [vic21:04047] *** End of error message ***
>>>>>
>>>>> _______________________________________________
>>>>> ewg mailing list
>>>>> ewg at lists.openfabrics.org
>>>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel at open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>
>>> _______________________________________________
>>> ewg mailing list
>>> ewg at lists.openfabrics.org
>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>>
>> _______________________________________________
>> ewg mailing list
>> ewg at lists.openfabrics.org
>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>>




More information about the ewg mailing list