[openib-general] Problems running MPI jobs with MVAPICH and MVAPICH2
Weikuan Yu
yuw at cse.ohio-state.edu
Wed Mar 22 13:08:02 PST 2006
Hi, Don,
Thanks for the detail information. They seemed to be correct.
However, let us look into this slightly differently. Does the problem
happen to 2 processes on the same node too? If so, we can focus on such
a case and get it cleared up first. Let us say we use `koa'. Could you
provide the following information to us? We can do that for
mvapich-gen2 first, assuming the cause of the problem will be similar
for the other case.
a) echo $LD_LIBRARY_PATH
b) The modified build script. The template you used for mvapich-gen2
should be make.mvapich.gen2.
c) build/configure log file: config.log, make.log, etc.
d) The output of this command:
# mpicc -show -o cpi example/basic/cpi.c
Thanks,
Weikuan
> Thanks for the response. I can well believe that the problem is some
> error in the setup of my environment. I just can't figure out what as
> yet. As requested, some more information is below on the two systems
> 'koa' and 'jatoba'.
>
> >
> > Since the MPD daemons seem to have been started properly, as your
> ability
> > to run non-MPI commands shows, I'm wondering if your $PATH may not
> be set
> > properly when compiling the cpi application.
> >
> > Can you just verify by 'echo'ing the $PATH before compiling? Can
> you also
> > provide us with other information about your environment?
> >
> > Also, when running MVAPICH2, can you try using the 'mpdtrace'
> command to
> > verify that MPD has properly started on the requested nodes?
> >
>
> For the MVAPICH (MPI-1) case:
>
> On system 'koa':
>
> [root at koa ib]# cat .mv1 <-- using this file to set the PATH
> echo "setting up MVAPICH environment"
> export PATH=/usr/local/mvapich/bin:$PATH
> [root at koa ib]# . .mv1 <-- sourcing the .mv1 file
> setting up MVAPICH environment
> [root at koa ib]# echo $PATH <-- PATH after setup
> /usr/local/mvapich/bin:/usr/kerberos/sbin:/sbin:/usr/kerberos/bin:/
> usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin
> [root at koa ib]# which mpicc <-- showing where the compiler script
> is located
> /usr/local/mvapich/bin/mpicc
> [root at koa ib]# mpd & <-- starting mpd
> [1] 32649
> [root at koa ib]# mpdtrace <-- with just mpd running on 'koa'
> mpdtrace: koa_53859: lhs=koa_53859 rhs=koa_53859 rhs2=koa_53859
> gen=1
> [root at koa ib]# mpdtrace <-- after starting mpd on the 'jatoba'
> system
> mpdtrace: koa_53859: lhs=jatoba_44789 rhs=jatoba_44789
> rhs2=koa_53859 gen=1
> mpdtrace: jatoba_44789: lhs=koa_53859 rhs=koa_53859
> rhs2=jatoba_44789 gen=1
> [root at koa ib]# /sbin/lsmod <-- showing that the various ib
> modules are loaded
> Module Size Used by
> ib_sdp 93448 0
> ib_cm 33472 1 ib_sdp
> ib_ipoib 41480 0
> ib_sa 14652 2 ib_sdp,ib_ipoib
> ib_uverbs 39760 0
> ib_umad 15024 0
> ib_mthca 120608 0
> ib_mad 37816 4 ib_cm,ib_sa,ib_umad,ib_mthca
> ib_core 48640 8
> ib_sdp,ib_cm,ib_ipoib,ib_sa,ib_uverbs,ib_umad,ib_mthca,ib_mad
>
>
> On system 'jatoba':
>
> [root at jatoba ib]# cat .mv1
> echo "setting up MVAPICH environment"
> export PATH=/usr/local/mvapich/bin:$PATH
> setting up MVAPICH environment
> [root at jatoba ib]# echo $PATH
> /usr/local/mvapich/bin:/usr/kerberos/sbin:/opt/gcc-3.3.3/bin:/usr/
> kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:/usr/local/
> bin:/usr/java/bin:/usr/ant/bin:/home/ib/bin
> [root at jatoba ib]# which mpicc
> /usr/local/mvapich/bin/mpicc
> [root at jatoba ib]# mpd -h koa -p 53859 & <-- starting mpd with
> parameters from 'koa'
> [1] 9442
> [root at jatoba ib]# mpdtrace <-- after starting mpd
> mpdtrace: jatoba_44789: lhs=koa_53859 rhs=koa_53859
> rhs2=jatoba_44789 gen=1
> mpdtrace: koa_53859: lhs=jatoba_44789 rhs=jatoba_44789
> rhs2=koa_53859 gen=1
> [root at jatoba ib]# /sbin/lsmod <-- showing that the various
> ib modules are loaded
> Module Size Used by
> ib_sdp 93448 0
> ib_cm 33344 1 ib_sdp
> ib_ipoib 41480 0
> ib_sa 14524 2 ib_sdp,ib_ipoib
> ib_uverbs 39760 0
> ib_umad 15024 0
> ib_mthca 120608 0
> ib_mad 37816 4 ib_cm,ib_sa,ib_umad,ib_mthca
> ib_core 48640 8
> ib_sdp,ib_cm,ib_ipoib,ib_sa,ib_uverbs,ib_umad,ib_mthca,ib_mad
>
>
>
> For the MVAPICH2 case:
>
> On 'koa':
>
> [root at koa ib]# cat .mv2
> echo "setting up MVAPICH2 environment"
> export PATH=/tmp/mvapich2/bin:$PATH
> [root at koa ib]# . .mv2
> setting up MVAPICH2 environment
> [root at koa ib]# echo $PATH
> /tmp/mvapich2/bin:/usr/kerberos/sbin:/sbin:/usr/kerberos/bin:/usr/
> local/bin:/bin:/usr/bin:/usr/X11R6/bin
> [root at koa ib]# which mpicc
> /tmp/mvapich2/bin/mpicc
> [root at koa ib]# mpdboot -n 2 -f /root/mpd.hosts
> [root at koa ib]# mpdtrace -l
> koa.az05.bull.com_53727
> jatoba.az05.bull.com_54528
>
> On 'jatoba':
>
> [jatoba] (ib) ib> su
> Password:
> [root at jatoba ib]# cat .mv2
> echo "setting up MPVAPICH2 environment"
> export PATH=/tmp/mvapich2/bin:$PATH
> [root at jatoba ib]# . .mv2
> setting up MPVAPICH2 environment
> [root at jatoba ib]# echo $PATH
> /tmp/mvapich2/bin:/usr/kerberos/sbin:/opt/gcc-3.3.3/bin:/usr/kerberos/
> bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:/usr/local/bin:/usr/
> java/bin:/usr/ant/bin:/home/ib/bin
> [root at jatoba ib]# which mpicc
> /tmp/mvapich2/bin/mpicc
> [root at jatoba ib]# mpdtrace -l
> jatoba.az05.bull.com_54528
> koa.az05.bull.com_53727
>
>
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-general
--
Weikuan Yu, Computer Science, OSU
http://www.cse.ohio-state.edu/~yuw
More information about the general
mailing list