[openib-general] Problems running MPI jobs with MVAPICH and MVAPICH2
Don.Albert at Bull.com
Don.Albert at Bull.com
Wed Mar 22 12:38:06 PST 2006
Matthew,
Thanks for the response. I can well believe that the problem is some
error in the setup of my environment. I just can't figure out what as
yet. As requested, some more information is below on the two systems
'koa' and 'jatoba'.
>
> Since the MPD daemons seem to have been started properly, as your
ability
> to run non-MPI commands shows, I'm wondering if your $PATH may not be
set
> properly when compiling the cpi application.
>
> Can you just verify by 'echo'ing the $PATH before compiling? Can you
also
> provide us with other information about your environment?
>
> Also, when running MVAPICH2, can you try using the 'mpdtrace' command to
> verify that MPD has properly started on the requested nodes?
>
For the MVAPICH (MPI-1) case:
On system 'koa':
[root at koa ib]# cat .mv1 <-- using this file to set the PATH
echo "setting up MVAPICH environment"
export PATH=/usr/local/mvapich/bin:$PATH
[root at koa ib]# . .mv1 <-- sourcing the .mv1 file
setting up MVAPICH environment
[root at koa ib]# echo $PATH <-- PATH after setup
/usr/local/mvapich/bin:/usr/kerberos/sbin:/sbin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin
[root at koa ib]# which mpicc <-- showing where the compiler script is
located
/usr/local/mvapich/bin/mpicc
[root at koa ib]# mpd & <-- starting mpd
[1] 32649
[root at koa ib]# mpdtrace <-- with just mpd running on 'koa'
mpdtrace: koa_53859: lhs=koa_53859 rhs=koa_53859 rhs2=koa_53859 gen=1
[root at koa ib]# mpdtrace <-- after starting mpd on the 'jatoba'
system
mpdtrace: koa_53859: lhs=jatoba_44789 rhs=jatoba_44789 rhs2=koa_53859
gen=1
mpdtrace: jatoba_44789: lhs=koa_53859 rhs=koa_53859 rhs2=jatoba_44789
gen=1
[root at koa ib]# /sbin/lsmod <-- showing that the various ib modules
are loaded
Module Size Used by
ib_sdp 93448 0
ib_cm 33472 1 ib_sdp
ib_ipoib 41480 0
ib_sa 14652 2 ib_sdp,ib_ipoib
ib_uverbs 39760 0
ib_umad 15024 0
ib_mthca 120608 0
ib_mad 37816 4 ib_cm,ib_sa,ib_umad,ib_mthca
ib_core 48640 8
ib_sdp,ib_cm,ib_ipoib,ib_sa,ib_uverbs,ib_umad,ib_mthca,ib_mad
On system 'jatoba':
[root at jatoba ib]# cat .mv1
echo "setting up MVAPICH environment"
export PATH=/usr/local/mvapich/bin:$PATH
setting up MVAPICH environment
[root at jatoba ib]# echo $PATH
/usr/local/mvapich/bin:/usr/kerberos/sbin:/opt/gcc-3.3.3/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:/usr/local/bin:/usr/java/bin:/usr/ant/bin:/home/ib/bin
[root at jatoba ib]# which mpicc
/usr/local/mvapich/bin/mpicc
[root at jatoba ib]# mpd -h koa -p 53859 & <-- starting mpd with parameters
from 'koa'
[1] 9442
[root at jatoba ib]# mpdtrace <-- after starting mpd
mpdtrace: jatoba_44789: lhs=koa_53859 rhs=koa_53859 rhs2=jatoba_44789
gen=1
mpdtrace: koa_53859: lhs=jatoba_44789 rhs=jatoba_44789 rhs2=koa_53859
gen=1
[root at jatoba ib]# /sbin/lsmod <-- showing that the various ib
modules are loaded
Module Size Used by
ib_sdp 93448 0
ib_cm 33344 1 ib_sdp
ib_ipoib 41480 0
ib_sa 14524 2 ib_sdp,ib_ipoib
ib_uverbs 39760 0
ib_umad 15024 0
ib_mthca 120608 0
ib_mad 37816 4 ib_cm,ib_sa,ib_umad,ib_mthca
ib_core 48640 8
ib_sdp,ib_cm,ib_ipoib,ib_sa,ib_uverbs,ib_umad,ib_mthca,ib_mad
For the MVAPICH2 case:
On 'koa':
[root at koa ib]# cat .mv2
echo "setting up MVAPICH2 environment"
export PATH=/tmp/mvapich2/bin:$PATH
[root at koa ib]# . .mv2
setting up MVAPICH2 environment
[root at koa ib]# echo $PATH
/tmp/mvapich2/bin:/usr/kerberos/sbin:/sbin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin
[root at koa ib]# which mpicc
/tmp/mvapich2/bin/mpicc
[root at koa ib]# mpdboot -n 2 -f /root/mpd.hosts
[root at koa ib]# mpdtrace -l
koa.az05.bull.com_53727
jatoba.az05.bull.com_54528
On 'jatoba':
[jatoba] (ib) ib> su
Password:
[root at jatoba ib]# cat .mv2
echo "setting up MPVAPICH2 environment"
export PATH=/tmp/mvapich2/bin:$PATH
[root at jatoba ib]# . .mv2
setting up MPVAPICH2 environment
[root at jatoba ib]# echo $PATH
/tmp/mvapich2/bin:/usr/kerberos/sbin:/opt/gcc-3.3.3/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:/usr/local/bin:/usr/java/bin:/usr/ant/bin:/home/ib/bin
[root at jatoba ib]# which mpicc
/tmp/mvapich2/bin/mpicc
[root at jatoba ib]# mpdtrace -l
jatoba.az05.bull.com_54528
koa.az05.bull.com_53727
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060322/fe7b98b6/attachment.html>
More information about the general
mailing list