[openfabrics-ewg] problems running MVAPICH on OFED 1.1 rc6 with SLES10 x86_64
Pavel Shamis (Pasha)
pasha at dev.mellanox.co.il
Wed Oct 11 08:48:45 PDT 2006
I mean SLES10.
(yes it's different distros)
Scott Weitzenkamp (sweitzen) wrote:
> You checked SUSE 10 or SLES 10, aren't those different distros?
>
> Scott Weitzenkamp
> SQA and Release Manager
> Server Virtualization Business Unit
> Cisco Systems
>
>
>
>> -----Original Message-----
>> From: Pavel Shamis (Pasha) [mailto:pasha at dev.mellanox.co.il]
>> Sent: Wednesday, October 11, 2006 3:09 AM
>> To: Scott Weitzenkamp (sweitzen)
>> Cc: Pavel Shamis (Pasha); Aviram Gutman; OpenFabricsEWG; openib
>> Subject: Re: [openfabrics-ewg] problems running MVAPICH on
>> OFED 1.1 rc6 with SLES10 x86_64
>>
>> On some of our SUSE 10 machines i found the 127.0.0.2 ip,
>> but it was pointing to some random Linux site (linux.org)
>> and has no effect on mpi runs.
>> In you case the ip point to _real_ machine, it very strange.
>>
>> Scott Weitzenkamp (sweitzen) wrote:
>>
>>> Aha, I found something in /etc/hosts, thanks for the hint.
>>>
>>> 127.0.0.2 svbu-qa1850-3.cisco.com svbu-qa1850-3
>>>
>>> If I comment this line out, MVAPICH works fine. Does
>>>
>> Mellanox have this
>>
>>> entry in /etc/hosts?
>>>
>>> Scott Weitzenkamp
>>> SQA and Release Manager
>>> Server Virtualization Business Unit
>>> Cisco Systems
>>>
>>>
>>>
>>>> -----Original Message-----
>>>> From: Pavel Shamis (Pasha) [mailto:pasha at mellanox.co.il]
>>>> Sent: Thursday, October 05, 2006 5:59 AM
>>>> To: Scott Weitzenkamp (sweitzen)
>>>> Cc: Aviram Gutman; OpenFabricsEWG; openib
>>>> Subject: Re: [openfabrics-ewg] problems running MVAPICH on
>>>> OFED 1.1 rc6 with SLES10 x86_64
>>>>
>>>>
>>>>> I see it for all MVAPICH tests, it's 100% consistent.
>>>>>
>>>> MVAPICH tests are osu_benchmarks (bw/lt/etc..) or all test
>>>> over mvapich
>>>> on SUSE10 platform ?
>>>> Please check /etc/hosts file on your machines, it should be
>>>> exactly the
>>>> same on all nodes.
>>>>
>>>> Regards,
>>>> Pasha
>>>>
>>>>
>>>>> Scott Weitzenkamp
>>>>> SQA and Release Manager
>>>>> Server Virtualization Business Unit
>>>>> Cisco Systems
>>>>>
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Pavel Shamis (Pasha) [mailto:pasha at mellanox.co.il]
>>>>>> Sent: Tuesday, October 03, 2006 3:37 AM
>>>>>> To: Scott Weitzenkamp (sweitzen)
>>>>>> Cc: Aviram Gutman; OpenFabricsEWG; openib
>>>>>> Subject: Re: [openfabrics-ewg] problems running MVAPICH on
>>>>>> OFED 1.1 rc6 with SLES10 x86_64
>>>>>>
>>>>>> Hi Scott,
>>>>>> Unfortunately was not able to reproduce the failure on our
>>>>>>
>>>> platforms.
>>>>
>>>>>> Do you see the problem with all tests or with the specific only ?
>>>>>> Is it consistent problem ?
>>>>>>
>>>>>> Regards,
>>>>>> Pasha
>>>>>>
>>>>>> Scott Weitzenkamp (sweitzen) wrote:
>>>>>>
>>>>>>> $ uname -a
>>>>>>> Linux svbu-qa1850-3 2.6.16.21-0.8-smp #1 SMP Mon Jul 3
>>>>>>>
>>>>>> 18:25:39 UTC 2006
>>>>>>
>>>>>>> x86_64
>>>>>>> x86_64 x86_64 GNU/Linux
>>>>>>> $
>>>>>>>
>> /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/bin/mpirun_rsh -np 2
>>
>>>>>>> 192.168.2.46 192.168.2.49 hostname
>>>>>>> svbu-qa1850-4
>>>>>>> svbu-qa1850-3
>>>>>>> $
>>>>>>>
>> /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/bin/mpirun_rsh -np 2
>>
>>>>>>> 192.168.2.46 192.168.2.49
>>>>>>>
>>>>>>>
>>>>>> /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/tests/osu_bench
>>>>>>
>>>>> marks-2.2/
>>>>>
>>>>>>> osu_latency
>>>>>>>
>>>>>>> The last command just hangs. Can I try your binary RPMs?
>>>>>>>
>>>>>>> Scott Weitzenkamp
>>>>>>> SQA and Release Manager
>>>>>>> Server Virtualization Business Unit
>>>>>>> Cisco Systems
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Aviram Gutman [mailto:aviram at dev.mellanox.co.il]
>>>>>>>> Sent: Sunday, October 01, 2006 2:29 AM
>>>>>>>> To: Scott Weitzenkamp (sweitzen)
>>>>>>>> Cc: OpenFabricsEWG; openib; pasha at mellanox.co.il
>>>>>>>> Subject: Re: [openfabrics-ewg] problems running MVAPICH on
>>>>>>>> OFED 1.1 rc6 with SLES10 x86_64
>>>>>>>>
>>>>>>>> Can you please elaborate on MVAPICH issues, can you send
>>>>>>>> command line?
>>>>>>>> We ran it here on 32 Opteron nodes each quad core and also
>>>>>>>>
>>>>>> rigorous
>>>>>>
>>>>>>>> tests on the many other nodes?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Scott Weitzenkamp (sweitzen) wrote:
>>>>>>>>
>>>>>>>>> We are just getting started with OFED testing on
>>>>>>>>>
>> SLES10, first
>>
>>>>>>>>> platform is x86_64.
>>>>>>>>>
>>>>>>>>> IPoIB, SDP, SRP, Open MPI, HP MPI, and Intel MPI are
>>>>>>>>>
>>>>>>>> working so far.
>>>>>>>>
>>>>>>>>> MVAPICH with OSU benchmarks just hang. This same
>>>>>>>>>
>>>>>> hardware works
>>>>>>
>>>>>>>>> fine with OFED and RHEL4 U3.
>>>>>>>>>
>>>>>>>>> Has anyone else seen this?
>>>>>>>>>
>>>>>>>>> Scott Weitzenkamp
>>>>>>>>> SQA and Release Manager
>>>>>>>>> Server Virtualization Business Unit
>>>>>>>>> Cisco Systems
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> --------------------------------------------------------------
>>>>>>>> ----------
>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> openfabrics-ewg mailing list
>>>>>>>>> openfabrics-ewg at openib.org
>>>>>>>>> http://openib.org/mailman/listinfo/openfabrics-ewg
>>>>>>>>>
>>>>>>>>>
>>>>>> --
>>>>>> Pavel Shamis (Pasha)
>>>>>> Software Engineer
>>>>>> Mellanox Technologies LTD.
>>>>>> pasha at mellanox.co.il
>>>>>>
>>>>>>
>>>> --
>>>> Pavel Shamis (Pasha)
>>>> Software Engineer
>>>> Mellanox Technologies LTD.
>>>> pasha at mellanox.co.il
>>>>
>>>>
>> --
>> Pavel Shamis (Pasha)
>> Software Engineer
>> Mellanox Technologies LTD.
>> pasha at mellanox.co.il
>>
>>
>
>
--
Pavel Shamis (Pasha)
Software Engineer
Mellanox Technologies LTD.
pasha at mellanox.co.il
More information about the ewg
mailing list