[openfabrics-ewg] problems running MVAPICH on OFED 1.1 rc6 with SLES10 x86_64

Pavel Shamis (Pasha) pasha at dev.mellanox.co.il
Wed Oct 11 09:02:46 PDT 2006


Here is some link about SuSE's bugs related to 127.0.0.2
https://bugzilla.novell.com/show_bug.cgi?id=165269

Check your SuEe auto-install stuff. It is possible that you have some
broken configuration in it.


Scott Weitzenkamp (sweitzen) wrote:
> We've installed four SLES10 machines so far, and they all have the
> "127.0.0.2 <myhostname>" entry.
> 
> Scott Weitzenkamp
> SQA and Release Manager
> Server Virtualization Business Unit
> Cisco Systems
>  
> 
>> -----Original Message-----
>> From: Pavel Shamis (Pasha) [mailto:pasha at dev.mellanox.co.il] 
>> Sent: Wednesday, October 11, 2006 8:49 AM
>> To: Scott Weitzenkamp (sweitzen)
>> Cc: Pavel Shamis (Pasha); Aviram Gutman; OpenFabricsEWG; openib
>> Subject: Re: [openfabrics-ewg] problems running MVAPICH on 
>> OFED 1.1 rc6 with SLES10 x86_64
>>
>> I mean SLES10.
>> (yes it's different distros)
>>
>> Scott Weitzenkamp (sweitzen) wrote:
>>> You checked SUSE 10 or SLES 10, aren't those different distros?
>>>
>>> Scott Weitzenkamp
>>> SQA and Release Manager
>>> Server Virtualization Business Unit
>>> Cisco Systems
>>>  
>>>
>>>   
>>>> -----Original Message-----
>>>> From: Pavel Shamis (Pasha) [mailto:pasha at dev.mellanox.co.il] 
>>>> Sent: Wednesday, October 11, 2006 3:09 AM
>>>> To: Scott Weitzenkamp (sweitzen)
>>>> Cc: Pavel Shamis (Pasha); Aviram Gutman; OpenFabricsEWG; openib
>>>> Subject: Re: [openfabrics-ewg] problems running MVAPICH on 
>>>> OFED 1.1 rc6 with SLES10 x86_64
>>>>
>>>> On some of our SUSE 10 machines i found the 127.0.0.2 ip,
>>>> but it was pointing to some random Linux site (linux.org)
>>>> and has no effect on mpi runs.
>>>> In you case the ip point to _real_ machine, it very strange.
>>>>
>>>> Scott Weitzenkamp (sweitzen) wrote:
>>>>     
>>>>> Aha, I found something in /etc/hosts, thanks for the hint.
>>>>>
>>>>> 	127.0.0.2       svbu-qa1850-3.cisco.com svbu-qa1850-3
>>>>>
>>>>> If I comment this line out, MVAPICH works fine.  Does 
>>>>>       
>>>> Mellanox have this
>>>>     
>>>>> entry in /etc/hosts?
>>>>>
>>>>> Scott Weitzenkamp
>>>>> SQA and Release Manager
>>>>> Server Virtualization Business Unit
>>>>> Cisco Systems
>>>>>  
>>>>>
>>>>>       
>>>>>> -----Original Message-----
>>>>>> From: Pavel Shamis (Pasha) [mailto:pasha at mellanox.co.il] 
>>>>>> Sent: Thursday, October 05, 2006 5:59 AM
>>>>>> To: Scott Weitzenkamp (sweitzen)
>>>>>> Cc: Aviram Gutman; OpenFabricsEWG; openib
>>>>>> Subject: Re: [openfabrics-ewg] problems running MVAPICH on 
>>>>>> OFED 1.1 rc6 with SLES10 x86_64
>>>>>>
>>>>>>         
>>>>>>> I see it for all MVAPICH tests, it's 100% consistent.
>>>>>>>           
>>>>>> MVAPICH tests are osu_benchmarks (bw/lt/etc..) or all test 
>>>>>> over mvapich 
>>>>>> on SUSE10 platform ?
>>>>>> Please check /etc/hosts file on your machines, it should be 
>>>>>> exactly the 
>>>>>> same on all nodes.
>>>>>>
>>>>>> Regards,
>>>>>> Pasha
>>>>>>
>>>>>>         
>>>>>>> Scott Weitzenkamp
>>>>>>> SQA and Release Manager
>>>>>>> Server Virtualization Business Unit
>>>>>>> Cisco Systems
>>>>>>>  
>>>>>>>
>>>>>>>           
>>>>>>>> -----Original Message-----
>>>>>>>> From: Pavel Shamis (Pasha) [mailto:pasha at mellanox.co.il] 
>>>>>>>> Sent: Tuesday, October 03, 2006 3:37 AM
>>>>>>>> To: Scott Weitzenkamp (sweitzen)
>>>>>>>> Cc: Aviram Gutman; OpenFabricsEWG; openib
>>>>>>>> Subject: Re: [openfabrics-ewg] problems running MVAPICH on 
>>>>>>>> OFED 1.1 rc6 with SLES10 x86_64
>>>>>>>>
>>>>>>>> Hi Scott,
>>>>>>>> Unfortunately was not able to reproduce the failure on our 
>>>>>>>>             
>>>>>> platforms.
>>>>>>         
>>>>>>>> Do you see the problem with all tests or with the 
>> specific only ?
>>>>>>>> Is it consistent problem ?
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Pasha
>>>>>>>>
>>>>>>>> Scott Weitzenkamp (sweitzen) wrote:
>>>>>>>>             
>>>>>>>>> $ uname -a
>>>>>>>>> Linux svbu-qa1850-3 2.6.16.21-0.8-smp #1 SMP Mon Jul 3 
>>>>>>>>>               
>>>>>>>> 18:25:39 UTC 2006
>>>>>>>>             
>>>>>>>>> x86_64
>>>>>>>>> x86_64 x86_64 GNU/Linux
>>>>>>>>> $ 
>>>>>>>>>               
>>>> /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/bin/mpirun_rsh -np 2
>>>>     
>>>>>>>>> 192.168.2.46 192.168.2.49 hostname
>>>>>>>>> svbu-qa1850-4
>>>>>>>>> svbu-qa1850-3
>>>>>>>>> $ 
>>>>>>>>>               
>>>> /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/bin/mpirun_rsh -np 2
>>>>     
>>>>>>>>> 192.168.2.46 192.168.2.49
>>>>>>>>>
>>>>>>>>>               
>>>>>>>> /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/tests/osu_bench
>>>>>>>>             
>>>>>>> marks-2.2/
>>>>>>>           
>>>>>>>>> osu_latency
>>>>>>>>>
>>>>>>>>> The last command just hangs.  Can I try your binary RPMs?
>>>>>>>>>
>>>>>>>>> Scott Weitzenkamp
>>>>>>>>> SQA and Release Manager
>>>>>>>>> Server Virtualization Business Unit
>>>>>>>>> Cisco Systems
>>>>>>>>>  
>>>>>>>>>
>>>>>>>>>               
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: Aviram Gutman [mailto:aviram at dev.mellanox.co.il] 
>>>>>>>>>> Sent: Sunday, October 01, 2006 2:29 AM
>>>>>>>>>> To: Scott Weitzenkamp (sweitzen)
>>>>>>>>>> Cc: OpenFabricsEWG; openib; pasha at mellanox.co.il
>>>>>>>>>> Subject: Re: [openfabrics-ewg] problems running MVAPICH on 
>>>>>>>>>> OFED 1.1 rc6 with SLES10 x86_64
>>>>>>>>>>
>>>>>>>>>> Can you please elaborate on MVAPICH issues, can you send 
>>>>>>>>>> command line? 
>>>>>>>>>> We ran it here on 32 Opteron nodes each quad core and also 
>>>>>>>>>>                 
>>>>>>>> rigorous 
>>>>>>>>             
>>>>>>>>>> tests on the many other nodes?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Scott Weitzenkamp (sweitzen) wrote:
>>>>>>>>>>                 
>>>>>>>>>>> We are just getting started with OFED testing on 
>>>>>>>>>>>                   
>>>> SLES10, first 
>>>>     
>>>>>>>>>>> platform is x86_64.
>>>>>>>>>>>  
>>>>>>>>>>> IPoIB, SDP, SRP, Open MPI, HP MPI, and Intel MPI are 
>>>>>>>>>>>                   
>>>>>>>>>> working so far.  
>>>>>>>>>>                 
>>>>>>>>>>> MVAPICH with OSU benchmarks just hang.    This same 
>>>>>>>>>>>                   
>>>>>>>> hardware works 
>>>>>>>>             
>>>>>>>>>>> fine with OFED and RHEL4 U3.
>>>>>>>>>>>  
>>>>>>>>>>> Has anyone else seen this?
>>>>>>>>>>>  
>>>>>>>>>>> Scott Weitzenkamp
>>>>>>>>>>> SQA and Release Manager
>>>>>>>>>>> Server Virtualization Business Unit
>>>>>>>>>>> Cisco Systems
>>>>>>>>>>>  
>>>>>>>>>>>
>>>>>>>>>>>                   
>> --------------------------------------------------------------
>>>>>>>>>> ----------
>>>>>>>>>>                 
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> openfabrics-ewg mailing list
>>>>>>>>>>> openfabrics-ewg at openib.org
>>>>>>>>>>> http://openib.org/mailman/listinfo/openfabrics-ewg
>>>>>>>>>>>   
>>>>>>>>>>>                   
>>>>>>>> -- 
>>>>>>>> Pavel Shamis (Pasha)
>>>>>>>> Software Engineer
>>>>>>>> Mellanox Technologies LTD.
>>>>>>>> pasha at mellanox.co.il
>>>>>>>>
>>>>>>>>             
>>>>>> -- 
>>>>>> Pavel Shamis (Pasha)
>>>>>> Software Engineer
>>>>>> Mellanox Technologies LTD.
>>>>>> pasha at mellanox.co.il
>>>>>>
>>>>>>         
>>>> -- 
>>>> Pavel Shamis (Pasha)
>>>> Software Engineer
>>>> Mellanox Technologies LTD.
>>>> pasha at mellanox.co.il
>>>>
>>>>     
>>>   
>>
>> -- 
>> Pavel Shamis (Pasha)
>> Software Engineer
>> Mellanox Technologies LTD.
>> pasha at mellanox.co.il
>>
> 


-- 
Pavel Shamis (Pasha)
Software Engineer
Mellanox Technologies LTD.
pasha at mellanox.co.il




More information about the ewg mailing list