[openfabrics-ewg] problems running MVAPICH on OFED 1.1 rc6 with SLES10 x86_64

Scott Weitzenkamp (sweitzen) sweitzen at cisco.com
Mon Oct 9 22:47:27 PDT 2006


Aha, I found something in /etc/hosts, thanks for the hint.

	127.0.0.2       svbu-qa1850-3.cisco.com svbu-qa1850-3

If I comment this line out, MVAPICH works fine.  Does Mellanox have this
entry in /etc/hosts?

Scott Weitzenkamp
SQA and Release Manager
Server Virtualization Business Unit
Cisco Systems
 

> -----Original Message-----
> From: Pavel Shamis (Pasha) [mailto:pasha at mellanox.co.il] 
> Sent: Thursday, October 05, 2006 5:59 AM
> To: Scott Weitzenkamp (sweitzen)
> Cc: Aviram Gutman; OpenFabricsEWG; openib
> Subject: Re: [openfabrics-ewg] problems running MVAPICH on 
> OFED 1.1 rc6 with SLES10 x86_64
> 
> > I see it for all MVAPICH tests, it's 100% consistent.
> 
> MVAPICH tests are osu_benchmarks (bw/lt/etc..) or all test 
> over mvapich 
> on SUSE10 platform ?
> Please check /etc/hosts file on your machines, it should be 
> exactly the 
> same on all nodes.
> 
> Regards,
> Pasha
> 
> > 
> > Scott Weitzenkamp
> > SQA and Release Manager
> > Server Virtualization Business Unit
> > Cisco Systems
> >  
> > 
> >> -----Original Message-----
> >> From: Pavel Shamis (Pasha) [mailto:pasha at mellanox.co.il] 
> >> Sent: Tuesday, October 03, 2006 3:37 AM
> >> To: Scott Weitzenkamp (sweitzen)
> >> Cc: Aviram Gutman; OpenFabricsEWG; openib
> >> Subject: Re: [openfabrics-ewg] problems running MVAPICH on 
> >> OFED 1.1 rc6 with SLES10 x86_64
> >>
> >> Hi Scott,
> >> Unfortunately was not able to reproduce the failure on our 
> platforms.
> >> Do you see the problem with all tests or with the specific only ?
> >> Is it consistent problem ?
> >>
> >> Regards,
> >> Pasha
> >>
> >> Scott Weitzenkamp (sweitzen) wrote:
> >>> $ uname -a
> >>> Linux svbu-qa1850-3 2.6.16.21-0.8-smp #1 SMP Mon Jul 3 
> >> 18:25:39 UTC 2006
> >>> x86_64
> >>> x86_64 x86_64 GNU/Linux
> >>> $ 
> >> /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/bin/mpirun_rsh -np 2
> >>> 192.168.2.46 192.168.2.49 hostname
> >>> svbu-qa1850-4
> >>> svbu-qa1850-3
> >>> $ 
> >> /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/bin/mpirun_rsh -np 2
> >>> 192.168.2.46 192.168.2.49
> >>>
> >> /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/tests/osu_bench
> > marks-2.2/
> >>> osu_latency
> >>>
> >>> The last command just hangs.  Can I try your binary RPMs?
> >>>
> >>> Scott Weitzenkamp
> >>> SQA and Release Manager
> >>> Server Virtualization Business Unit
> >>> Cisco Systems
> >>>  
> >>>
> >>>> -----Original Message-----
> >>>> From: Aviram Gutman [mailto:aviram at dev.mellanox.co.il] 
> >>>> Sent: Sunday, October 01, 2006 2:29 AM
> >>>> To: Scott Weitzenkamp (sweitzen)
> >>>> Cc: OpenFabricsEWG; openib; pasha at mellanox.co.il
> >>>> Subject: Re: [openfabrics-ewg] problems running MVAPICH on 
> >>>> OFED 1.1 rc6 with SLES10 x86_64
> >>>>
> >>>> Can you please elaborate on MVAPICH issues, can you send 
> >>>> command line? 
> >>>> We ran it here on 32 Opteron nodes each quad core and also 
> >> rigorous 
> >>>> tests on the many other nodes?
> >>>>
> >>>>
> >>>>
> >>>> Scott Weitzenkamp (sweitzen) wrote:
> >>>>> We are just getting started with OFED testing on SLES10, first 
> >>>>> platform is x86_64.
> >>>>>  
> >>>>> IPoIB, SDP, SRP, Open MPI, HP MPI, and Intel MPI are 
> >>>> working so far.  
> >>>>> MVAPICH with OSU benchmarks just hang.    This same 
> >> hardware works 
> >>>>> fine with OFED and RHEL4 U3.
> >>>>>  
> >>>>> Has anyone else seen this?
> >>>>>  
> >>>>> Scott Weitzenkamp
> >>>>> SQA and Release Manager
> >>>>> Server Virtualization Business Unit
> >>>>> Cisco Systems
> >>>>>  
> >>>>>
> >>>> --------------------------------------------------------------
> >>>> ----------
> >>>>> _______________________________________________
> >>>>> openfabrics-ewg mailing list
> >>>>> openfabrics-ewg at openib.org
> >>>>> http://openib.org/mailman/listinfo/openfabrics-ewg
> >>>>>   
> >>
> >> -- 
> >> Pavel Shamis (Pasha)
> >> Software Engineer
> >> Mellanox Technologies LTD.
> >> pasha at mellanox.co.il
> >>
> > 
> 
> 
> -- 
> Pavel Shamis (Pasha)
> Software Engineer
> Mellanox Technologies LTD.
> pasha at mellanox.co.il
> 




More information about the ewg mailing list