[openfabrics-ewg] problems running MVAPICH on OFED 1.1 rc6 with SLES10 x86_64

Scott Weitzenkamp (sweitzen) sweitzen at cisco.com
Wed Oct 11 08:53:59 PDT 2006


We've installed four SLES10 machines so far, and they all have the
"127.0.0.2 <myhostname>" entry.

Scott Weitzenkamp
SQA and Release Manager
Server Virtualization Business Unit
Cisco Systems
 

> -----Original Message-----
> From: Pavel Shamis (Pasha) [mailto:pasha at dev.mellanox.co.il] 
> Sent: Wednesday, October 11, 2006 8:49 AM
> To: Scott Weitzenkamp (sweitzen)
> Cc: Pavel Shamis (Pasha); Aviram Gutman; OpenFabricsEWG; openib
> Subject: Re: [openfabrics-ewg] problems running MVAPICH on 
> OFED 1.1 rc6 with SLES10 x86_64
> 
> I mean SLES10.
> (yes it's different distros)
> 
> Scott Weitzenkamp (sweitzen) wrote:
> > You checked SUSE 10 or SLES 10, aren't those different distros?
> >
> > Scott Weitzenkamp
> > SQA and Release Manager
> > Server Virtualization Business Unit
> > Cisco Systems
> >  
> >
> >   
> >> -----Original Message-----
> >> From: Pavel Shamis (Pasha) [mailto:pasha at dev.mellanox.co.il] 
> >> Sent: Wednesday, October 11, 2006 3:09 AM
> >> To: Scott Weitzenkamp (sweitzen)
> >> Cc: Pavel Shamis (Pasha); Aviram Gutman; OpenFabricsEWG; openib
> >> Subject: Re: [openfabrics-ewg] problems running MVAPICH on 
> >> OFED 1.1 rc6 with SLES10 x86_64
> >>
> >> On some of our SUSE 10 machines i found the 127.0.0.2 ip,
> >> but it was pointing to some random Linux site (linux.org)
> >> and has no effect on mpi runs.
> >> In you case the ip point to _real_ machine, it very strange.
> >>
> >> Scott Weitzenkamp (sweitzen) wrote:
> >>     
> >>> Aha, I found something in /etc/hosts, thanks for the hint.
> >>>
> >>> 	127.0.0.2       svbu-qa1850-3.cisco.com svbu-qa1850-3
> >>>
> >>> If I comment this line out, MVAPICH works fine.  Does 
> >>>       
> >> Mellanox have this
> >>     
> >>> entry in /etc/hosts?
> >>>
> >>> Scott Weitzenkamp
> >>> SQA and Release Manager
> >>> Server Virtualization Business Unit
> >>> Cisco Systems
> >>>  
> >>>
> >>>       
> >>>> -----Original Message-----
> >>>> From: Pavel Shamis (Pasha) [mailto:pasha at mellanox.co.il] 
> >>>> Sent: Thursday, October 05, 2006 5:59 AM
> >>>> To: Scott Weitzenkamp (sweitzen)
> >>>> Cc: Aviram Gutman; OpenFabricsEWG; openib
> >>>> Subject: Re: [openfabrics-ewg] problems running MVAPICH on 
> >>>> OFED 1.1 rc6 with SLES10 x86_64
> >>>>
> >>>>         
> >>>>> I see it for all MVAPICH tests, it's 100% consistent.
> >>>>>           
> >>>> MVAPICH tests are osu_benchmarks (bw/lt/etc..) or all test 
> >>>> over mvapich 
> >>>> on SUSE10 platform ?
> >>>> Please check /etc/hosts file on your machines, it should be 
> >>>> exactly the 
> >>>> same on all nodes.
> >>>>
> >>>> Regards,
> >>>> Pasha
> >>>>
> >>>>         
> >>>>> Scott Weitzenkamp
> >>>>> SQA and Release Manager
> >>>>> Server Virtualization Business Unit
> >>>>> Cisco Systems
> >>>>>  
> >>>>>
> >>>>>           
> >>>>>> -----Original Message-----
> >>>>>> From: Pavel Shamis (Pasha) [mailto:pasha at mellanox.co.il] 
> >>>>>> Sent: Tuesday, October 03, 2006 3:37 AM
> >>>>>> To: Scott Weitzenkamp (sweitzen)
> >>>>>> Cc: Aviram Gutman; OpenFabricsEWG; openib
> >>>>>> Subject: Re: [openfabrics-ewg] problems running MVAPICH on 
> >>>>>> OFED 1.1 rc6 with SLES10 x86_64
> >>>>>>
> >>>>>> Hi Scott,
> >>>>>> Unfortunately was not able to reproduce the failure on our 
> >>>>>>             
> >>>> platforms.
> >>>>         
> >>>>>> Do you see the problem with all tests or with the 
> specific only ?
> >>>>>> Is it consistent problem ?
> >>>>>>
> >>>>>> Regards,
> >>>>>> Pasha
> >>>>>>
> >>>>>> Scott Weitzenkamp (sweitzen) wrote:
> >>>>>>             
> >>>>>>> $ uname -a
> >>>>>>> Linux svbu-qa1850-3 2.6.16.21-0.8-smp #1 SMP Mon Jul 3 
> >>>>>>>               
> >>>>>> 18:25:39 UTC 2006
> >>>>>>             
> >>>>>>> x86_64
> >>>>>>> x86_64 x86_64 GNU/Linux
> >>>>>>> $ 
> >>>>>>>               
> >> /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/bin/mpirun_rsh -np 2
> >>     
> >>>>>>> 192.168.2.46 192.168.2.49 hostname
> >>>>>>> svbu-qa1850-4
> >>>>>>> svbu-qa1850-3
> >>>>>>> $ 
> >>>>>>>               
> >> /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/bin/mpirun_rsh -np 2
> >>     
> >>>>>>> 192.168.2.46 192.168.2.49
> >>>>>>>
> >>>>>>>               
> >>>>>> /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/tests/osu_bench
> >>>>>>             
> >>>>> marks-2.2/
> >>>>>           
> >>>>>>> osu_latency
> >>>>>>>
> >>>>>>> The last command just hangs.  Can I try your binary RPMs?
> >>>>>>>
> >>>>>>> Scott Weitzenkamp
> >>>>>>> SQA and Release Manager
> >>>>>>> Server Virtualization Business Unit
> >>>>>>> Cisco Systems
> >>>>>>>  
> >>>>>>>
> >>>>>>>               
> >>>>>>>> -----Original Message-----
> >>>>>>>> From: Aviram Gutman [mailto:aviram at dev.mellanox.co.il] 
> >>>>>>>> Sent: Sunday, October 01, 2006 2:29 AM
> >>>>>>>> To: Scott Weitzenkamp (sweitzen)
> >>>>>>>> Cc: OpenFabricsEWG; openib; pasha at mellanox.co.il
> >>>>>>>> Subject: Re: [openfabrics-ewg] problems running MVAPICH on 
> >>>>>>>> OFED 1.1 rc6 with SLES10 x86_64
> >>>>>>>>
> >>>>>>>> Can you please elaborate on MVAPICH issues, can you send 
> >>>>>>>> command line? 
> >>>>>>>> We ran it here on 32 Opteron nodes each quad core and also 
> >>>>>>>>                 
> >>>>>> rigorous 
> >>>>>>             
> >>>>>>>> tests on the many other nodes?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Scott Weitzenkamp (sweitzen) wrote:
> >>>>>>>>                 
> >>>>>>>>> We are just getting started with OFED testing on 
> >>>>>>>>>                   
> >> SLES10, first 
> >>     
> >>>>>>>>> platform is x86_64.
> >>>>>>>>>  
> >>>>>>>>> IPoIB, SDP, SRP, Open MPI, HP MPI, and Intel MPI are 
> >>>>>>>>>                   
> >>>>>>>> working so far.  
> >>>>>>>>                 
> >>>>>>>>> MVAPICH with OSU benchmarks just hang.    This same 
> >>>>>>>>>                   
> >>>>>> hardware works 
> >>>>>>             
> >>>>>>>>> fine with OFED and RHEL4 U3.
> >>>>>>>>>  
> >>>>>>>>> Has anyone else seen this?
> >>>>>>>>>  
> >>>>>>>>> Scott Weitzenkamp
> >>>>>>>>> SQA and Release Manager
> >>>>>>>>> Server Virtualization Business Unit
> >>>>>>>>> Cisco Systems
> >>>>>>>>>  
> >>>>>>>>>
> >>>>>>>>>                   
> >>>>>>>> 
> --------------------------------------------------------------
> >>>>>>>> ----------
> >>>>>>>>                 
> >>>>>>>>> _______________________________________________
> >>>>>>>>> openfabrics-ewg mailing list
> >>>>>>>>> openfabrics-ewg at openib.org
> >>>>>>>>> http://openib.org/mailman/listinfo/openfabrics-ewg
> >>>>>>>>>   
> >>>>>>>>>                   
> >>>>>> -- 
> >>>>>> Pavel Shamis (Pasha)
> >>>>>> Software Engineer
> >>>>>> Mellanox Technologies LTD.
> >>>>>> pasha at mellanox.co.il
> >>>>>>
> >>>>>>             
> >>>> -- 
> >>>> Pavel Shamis (Pasha)
> >>>> Software Engineer
> >>>> Mellanox Technologies LTD.
> >>>> pasha at mellanox.co.il
> >>>>
> >>>>         
> >> -- 
> >> Pavel Shamis (Pasha)
> >> Software Engineer
> >> Mellanox Technologies LTD.
> >> pasha at mellanox.co.il
> >>
> >>     
> >
> >   
> 
> 
> -- 
> Pavel Shamis (Pasha)
> Software Engineer
> Mellanox Technologies LTD.
> pasha at mellanox.co.il
> 




More information about the ewg mailing list