[openib-general] [openfabrics-ewg] problems running MVAPICH on OFED 1.1 rc6 with SLES10 x86_64

Pavel Shamis (Pasha) pasha at dev.mellanox.co.il
Wed Oct 11 03:09:19 PDT 2006


On some of our SUSE 10 machines i found the 127.0.0.2 ip,
but it was pointing to some random Linux site (linux.org)
and has no effect on mpi runs.
In you case the ip point to _real_ machine, it very strange.

Scott Weitzenkamp (sweitzen) wrote:
> Aha, I found something in /etc/hosts, thanks for the hint.
> 
> 	127.0.0.2       svbu-qa1850-3.cisco.com svbu-qa1850-3
> 
> If I comment this line out, MVAPICH works fine.  Does Mellanox have this
> entry in /etc/hosts?
> 
> Scott Weitzenkamp
> SQA and Release Manager
> Server Virtualization Business Unit
> Cisco Systems
>  
> 
>> -----Original Message-----
>> From: Pavel Shamis (Pasha) [mailto:pasha at mellanox.co.il] 
>> Sent: Thursday, October 05, 2006 5:59 AM
>> To: Scott Weitzenkamp (sweitzen)
>> Cc: Aviram Gutman; OpenFabricsEWG; openib
>> Subject: Re: [openfabrics-ewg] problems running MVAPICH on 
>> OFED 1.1 rc6 with SLES10 x86_64
>>
>>> I see it for all MVAPICH tests, it's 100% consistent.
>> MVAPICH tests are osu_benchmarks (bw/lt/etc..) or all test 
>> over mvapich 
>> on SUSE10 platform ?
>> Please check /etc/hosts file on your machines, it should be 
>> exactly the 
>> same on all nodes.
>>
>> Regards,
>> Pasha
>>
>>> Scott Weitzenkamp
>>> SQA and Release Manager
>>> Server Virtualization Business Unit
>>> Cisco Systems
>>>  
>>>
>>>> -----Original Message-----
>>>> From: Pavel Shamis (Pasha) [mailto:pasha at mellanox.co.il] 
>>>> Sent: Tuesday, October 03, 2006 3:37 AM
>>>> To: Scott Weitzenkamp (sweitzen)
>>>> Cc: Aviram Gutman; OpenFabricsEWG; openib
>>>> Subject: Re: [openfabrics-ewg] problems running MVAPICH on 
>>>> OFED 1.1 rc6 with SLES10 x86_64
>>>>
>>>> Hi Scott,
>>>> Unfortunately was not able to reproduce the failure on our 
>> platforms.
>>>> Do you see the problem with all tests or with the specific only ?
>>>> Is it consistent problem ?
>>>>
>>>> Regards,
>>>> Pasha
>>>>
>>>> Scott Weitzenkamp (sweitzen) wrote:
>>>>> $ uname -a
>>>>> Linux svbu-qa1850-3 2.6.16.21-0.8-smp #1 SMP Mon Jul 3 
>>>> 18:25:39 UTC 2006
>>>>> x86_64
>>>>> x86_64 x86_64 GNU/Linux
>>>>> $ 
>>>> /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/bin/mpirun_rsh -np 2
>>>>> 192.168.2.46 192.168.2.49 hostname
>>>>> svbu-qa1850-4
>>>>> svbu-qa1850-3
>>>>> $ 
>>>> /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/bin/mpirun_rsh -np 2
>>>>> 192.168.2.46 192.168.2.49
>>>>>
>>>> /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/tests/osu_bench
>>> marks-2.2/
>>>>> osu_latency
>>>>>
>>>>> The last command just hangs.  Can I try your binary RPMs?
>>>>>
>>>>> Scott Weitzenkamp
>>>>> SQA and Release Manager
>>>>> Server Virtualization Business Unit
>>>>> Cisco Systems
>>>>>  
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Aviram Gutman [mailto:aviram at dev.mellanox.co.il] 
>>>>>> Sent: Sunday, October 01, 2006 2:29 AM
>>>>>> To: Scott Weitzenkamp (sweitzen)
>>>>>> Cc: OpenFabricsEWG; openib; pasha at mellanox.co.il
>>>>>> Subject: Re: [openfabrics-ewg] problems running MVAPICH on 
>>>>>> OFED 1.1 rc6 with SLES10 x86_64
>>>>>>
>>>>>> Can you please elaborate on MVAPICH issues, can you send 
>>>>>> command line? 
>>>>>> We ran it here on 32 Opteron nodes each quad core and also 
>>>> rigorous 
>>>>>> tests on the many other nodes?
>>>>>>
>>>>>>
>>>>>>
>>>>>> Scott Weitzenkamp (sweitzen) wrote:
>>>>>>> We are just getting started with OFED testing on SLES10, first 
>>>>>>> platform is x86_64.
>>>>>>>  
>>>>>>> IPoIB, SDP, SRP, Open MPI, HP MPI, and Intel MPI are 
>>>>>> working so far.  
>>>>>>> MVAPICH with OSU benchmarks just hang.    This same 
>>>> hardware works 
>>>>>>> fine with OFED and RHEL4 U3.
>>>>>>>  
>>>>>>> Has anyone else seen this?
>>>>>>>  
>>>>>>> Scott Weitzenkamp
>>>>>>> SQA and Release Manager
>>>>>>> Server Virtualization Business Unit
>>>>>>> Cisco Systems
>>>>>>>  
>>>>>>>
>>>>>> --------------------------------------------------------------
>>>>>> ----------
>>>>>>> _______________________________________________
>>>>>>> openfabrics-ewg mailing list
>>>>>>> openfabrics-ewg at openib.org
>>>>>>> http://openib.org/mailman/listinfo/openfabrics-ewg
>>>>>>>   
>>>> -- 
>>>> Pavel Shamis (Pasha)
>>>> Software Engineer
>>>> Mellanox Technologies LTD.
>>>> pasha at mellanox.co.il
>>>>
>>
>> -- 
>> Pavel Shamis (Pasha)
>> Software Engineer
>> Mellanox Technologies LTD.
>> pasha at mellanox.co.il
>>
> 


-- 
Pavel Shamis (Pasha)
Software Engineer
Mellanox Technologies LTD.
pasha at mellanox.co.il




More information about the general mailing list