[openfabrics-ewg] problems running MVAPICH on OFED 1.1 rc6 with SLES10 x86_64

Scott Weitzenkamp (sweitzen) sweitzen at cisco.com
Wed Oct 11 12:56:26 PDT 2006


We aren't using SLES auto-install.  But I did google for "SLES
127.0.0.2" and found this at
http://www.novell.com/documentation/novellaudit20/readme/novellaudit20_r
eadme.html:

      2.8 SLES 10 hosts File
      SLES 10 includes two localhost entries in the /etc/hosts file:
127.0.0.1 and 127.0.0.2 .

The steps for installing Oracle10g on SLES10 at
http://wiki.novell.com/index.php/Oracle10g_R2_Database_on_SLES10_for_i38
6_Step-by-Step_1 also reference commenting out the 127.0.0.2 line.

Please add this step to the OFED 1.1 MVAPICH release notes.

Scott Weitzenkamp
SQA and Release Manager
Server Virtualization Business Unit
Cisco Systems
 

> -----Original Message-----
> From: Pavel Shamis (Pasha) [mailto:pasha at dev.mellanox.co.il] 
> Sent: Wednesday, October 11, 2006 9:03 AM
> To: Scott Weitzenkamp (sweitzen)
> Cc: Pavel Shamis (Pasha); Aviram Gutman; OpenFabricsEWG; openib
> Subject: Re: [openfabrics-ewg] problems running MVAPICH on 
> OFED 1.1 rc6 with SLES10 x86_64
> 
> Here is some link about SuSE's bugs related to 127.0.0.2
> https://bugzilla.novell.com/show_bug.cgi?id=165269
> 
> Check your SuEe auto-install stuff. It is possible that you have some
> broken configuration in it.
> 
> 
> Scott Weitzenkamp (sweitzen) wrote:
> > We've installed four SLES10 machines so far, and they all have the
> > "127.0.0.2 <myhostname>" entry.
> > 
> > Scott Weitzenkamp
> > SQA and Release Manager
> > Server Virtualization Business Unit
> > Cisco Systems
> >  
> > 
> >> -----Original Message-----
> >> From: Pavel Shamis (Pasha) [mailto:pasha at dev.mellanox.co.il] 
> >> Sent: Wednesday, October 11, 2006 8:49 AM
> >> To: Scott Weitzenkamp (sweitzen)
> >> Cc: Pavel Shamis (Pasha); Aviram Gutman; OpenFabricsEWG; openib
> >> Subject: Re: [openfabrics-ewg] problems running MVAPICH on 
> >> OFED 1.1 rc6 with SLES10 x86_64
> >>
> >> I mean SLES10.
> >> (yes it's different distros)
> >>
> >> Scott Weitzenkamp (sweitzen) wrote:
> >>> You checked SUSE 10 or SLES 10, aren't those different distros?
> >>>
> >>> Scott Weitzenkamp
> >>> SQA and Release Manager
> >>> Server Virtualization Business Unit
> >>> Cisco Systems
> >>>  
> >>>
> >>>   
> >>>> -----Original Message-----
> >>>> From: Pavel Shamis (Pasha) [mailto:pasha at dev.mellanox.co.il] 
> >>>> Sent: Wednesday, October 11, 2006 3:09 AM
> >>>> To: Scott Weitzenkamp (sweitzen)
> >>>> Cc: Pavel Shamis (Pasha); Aviram Gutman; OpenFabricsEWG; openib
> >>>> Subject: Re: [openfabrics-ewg] problems running MVAPICH on 
> >>>> OFED 1.1 rc6 with SLES10 x86_64
> >>>>
> >>>> On some of our SUSE 10 machines i found the 127.0.0.2 ip,
> >>>> but it was pointing to some random Linux site (linux.org)
> >>>> and has no effect on mpi runs.
> >>>> In you case the ip point to _real_ machine, it very strange.
> >>>>
> >>>> Scott Weitzenkamp (sweitzen) wrote:
> >>>>     
> >>>>> Aha, I found something in /etc/hosts, thanks for the hint.
> >>>>>
> >>>>> 	127.0.0.2       svbu-qa1850-3.cisco.com svbu-qa1850-3
> >>>>>
> >>>>> If I comment this line out, MVAPICH works fine.  Does 
> >>>>>       
> >>>> Mellanox have this
> >>>>     
> >>>>> entry in /etc/hosts?
> >>>>>
> >>>>> Scott Weitzenkamp
> >>>>> SQA and Release Manager
> >>>>> Server Virtualization Business Unit
> >>>>> Cisco Systems
> >>>>>  
> >>>>>
> >>>>>       
> >>>>>> -----Original Message-----
> >>>>>> From: Pavel Shamis (Pasha) [mailto:pasha at mellanox.co.il] 
> >>>>>> Sent: Thursday, October 05, 2006 5:59 AM
> >>>>>> To: Scott Weitzenkamp (sweitzen)
> >>>>>> Cc: Aviram Gutman; OpenFabricsEWG; openib
> >>>>>> Subject: Re: [openfabrics-ewg] problems running MVAPICH on 
> >>>>>> OFED 1.1 rc6 with SLES10 x86_64
> >>>>>>
> >>>>>>         
> >>>>>>> I see it for all MVAPICH tests, it's 100% consistent.
> >>>>>>>           
> >>>>>> MVAPICH tests are osu_benchmarks (bw/lt/etc..) or all test 
> >>>>>> over mvapich 
> >>>>>> on SUSE10 platform ?
> >>>>>> Please check /etc/hosts file on your machines, it should be 
> >>>>>> exactly the 
> >>>>>> same on all nodes.
> >>>>>>
> >>>>>> Regards,
> >>>>>> Pasha
> >>>>>>
> >>>>>>         
> >>>>>>> Scott Weitzenkamp
> >>>>>>> SQA and Release Manager
> >>>>>>> Server Virtualization Business Unit
> >>>>>>> Cisco Systems
> >>>>>>>  
> >>>>>>>
> >>>>>>>           
> >>>>>>>> -----Original Message-----
> >>>>>>>> From: Pavel Shamis (Pasha) [mailto:pasha at mellanox.co.il] 
> >>>>>>>> Sent: Tuesday, October 03, 2006 3:37 AM
> >>>>>>>> To: Scott Weitzenkamp (sweitzen)
> >>>>>>>> Cc: Aviram Gutman; OpenFabricsEWG; openib
> >>>>>>>> Subject: Re: [openfabrics-ewg] problems running MVAPICH on 
> >>>>>>>> OFED 1.1 rc6 with SLES10 x86_64
> >>>>>>>>
> >>>>>>>> Hi Scott,
> >>>>>>>> Unfortunately was not able to reproduce the failure on our 
> >>>>>>>>             
> >>>>>> platforms.
> >>>>>>         
> >>>>>>>> Do you see the problem with all tests or with the 
> >> specific only ?
> >>>>>>>> Is it consistent problem ?
> >>>>>>>>
> >>>>>>>> Regards,
> >>>>>>>> Pasha
> >>>>>>>>
> >>>>>>>> Scott Weitzenkamp (sweitzen) wrote:
> >>>>>>>>             
> >>>>>>>>> $ uname -a
> >>>>>>>>> Linux svbu-qa1850-3 2.6.16.21-0.8-smp #1 SMP Mon Jul 3 
> >>>>>>>>>               
> >>>>>>>> 18:25:39 UTC 2006
> >>>>>>>>             
> >>>>>>>>> x86_64
> >>>>>>>>> x86_64 x86_64 GNU/Linux
> >>>>>>>>> $ 
> >>>>>>>>>               
> >>>> 
> /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/bin/mpirun_rsh -np 2
> >>>>     
> >>>>>>>>> 192.168.2.46 192.168.2.49 hostname
> >>>>>>>>> svbu-qa1850-4
> >>>>>>>>> svbu-qa1850-3
> >>>>>>>>> $ 
> >>>>>>>>>               
> >>>> 
> /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/bin/mpirun_rsh -np 2
> >>>>     
> >>>>>>>>> 192.168.2.46 192.168.2.49
> >>>>>>>>>
> >>>>>>>>>               
> >>>>>>>> 
> /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/tests/osu_bench
> >>>>>>>>             
> >>>>>>> marks-2.2/
> >>>>>>>           
> >>>>>>>>> osu_latency
> >>>>>>>>>
> >>>>>>>>> The last command just hangs.  Can I try your binary RPMs?
> >>>>>>>>>
> >>>>>>>>> Scott Weitzenkamp
> >>>>>>>>> SQA and Release Manager
> >>>>>>>>> Server Virtualization Business Unit
> >>>>>>>>> Cisco Systems
> >>>>>>>>>  
> >>>>>>>>>
> >>>>>>>>>               
> >>>>>>>>>> -----Original Message-----
> >>>>>>>>>> From: Aviram Gutman [mailto:aviram at dev.mellanox.co.il] 
> >>>>>>>>>> Sent: Sunday, October 01, 2006 2:29 AM
> >>>>>>>>>> To: Scott Weitzenkamp (sweitzen)
> >>>>>>>>>> Cc: OpenFabricsEWG; openib; pasha at mellanox.co.il
> >>>>>>>>>> Subject: Re: [openfabrics-ewg] problems running MVAPICH on 
> >>>>>>>>>> OFED 1.1 rc6 with SLES10 x86_64
> >>>>>>>>>>
> >>>>>>>>>> Can you please elaborate on MVAPICH issues, can you send 
> >>>>>>>>>> command line? 
> >>>>>>>>>> We ran it here on 32 Opteron nodes each quad core and also 
> >>>>>>>>>>                 
> >>>>>>>> rigorous 
> >>>>>>>>             
> >>>>>>>>>> tests on the many other nodes?
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Scott Weitzenkamp (sweitzen) wrote:
> >>>>>>>>>>                 
> >>>>>>>>>>> We are just getting started with OFED testing on 
> >>>>>>>>>>>                   
> >>>> SLES10, first 
> >>>>     
> >>>>>>>>>>> platform is x86_64.
> >>>>>>>>>>>  
> >>>>>>>>>>> IPoIB, SDP, SRP, Open MPI, HP MPI, and Intel MPI are 
> >>>>>>>>>>>                   
> >>>>>>>>>> working so far.  
> >>>>>>>>>>                 
> >>>>>>>>>>> MVAPICH with OSU benchmarks just hang.    This same 
> >>>>>>>>>>>                   
> >>>>>>>> hardware works 
> >>>>>>>>             
> >>>>>>>>>>> fine with OFED and RHEL4 U3.
> >>>>>>>>>>>  
> >>>>>>>>>>> Has anyone else seen this?
> >>>>>>>>>>>  
> >>>>>>>>>>> Scott Weitzenkamp
> >>>>>>>>>>> SQA and Release Manager
> >>>>>>>>>>> Server Virtualization Business Unit
> >>>>>>>>>>> Cisco Systems
> >>>>>>>>>>>  
> >>>>>>>>>>>
> >>>>>>>>>>>                   
> >> --------------------------------------------------------------
> >>>>>>>>>> ----------
> >>>>>>>>>>                 
> >>>>>>>>>>> _______________________________________________
> >>>>>>>>>>> openfabrics-ewg mailing list
> >>>>>>>>>>> openfabrics-ewg at openib.org
> >>>>>>>>>>> http://openib.org/mailman/listinfo/openfabrics-ewg
> >>>>>>>>>>>   
> >>>>>>>>>>>                   
> >>>>>>>> -- 
> >>>>>>>> Pavel Shamis (Pasha)
> >>>>>>>> Software Engineer
> >>>>>>>> Mellanox Technologies LTD.
> >>>>>>>> pasha at mellanox.co.il
> >>>>>>>>
> >>>>>>>>             
> >>>>>> -- 
> >>>>>> Pavel Shamis (Pasha)
> >>>>>> Software Engineer
> >>>>>> Mellanox Technologies LTD.
> >>>>>> pasha at mellanox.co.il
> >>>>>>
> >>>>>>         
> >>>> -- 
> >>>> Pavel Shamis (Pasha)
> >>>> Software Engineer
> >>>> Mellanox Technologies LTD.
> >>>> pasha at mellanox.co.il
> >>>>
> >>>>     
> >>>   
> >>
> >> -- 
> >> Pavel Shamis (Pasha)
> >> Software Engineer
> >> Mellanox Technologies LTD.
> >> pasha at mellanox.co.il
> >>
> > 
> 
> 
> -- 
> Pavel Shamis (Pasha)
> Software Engineer
> Mellanox Technologies LTD.
> pasha at mellanox.co.il
> 




More information about the ewg mailing list