[openib-general] Problems running MPI jobs with MVAPICH and MVAPICH2

Weikuan Yu yuw at cse.ohio-state.edu
Wed Mar 22 13:08:02 PST 2006


Hi, Don,

Thanks for the detail information. They seemed to be correct.

However, let us look into this slightly differently. Does the problem  
happen to 2 processes on the same node too? If so, we can focus on such  
a case and get it cleared up first. Let us say we use `koa'. Could you  
provide the following information to us? We can do that for  
mvapich-gen2 first, assuming the cause of the problem will be similar  
for the other case.

a) echo $LD_LIBRARY_PATH
b) The modified build script. The template you used for mvapich-gen2  
should be make.mvapich.gen2.
c) build/configure log file: config.log, make.log, etc.
d) The output of this command:
# mpicc -show -o cpi example/basic/cpi.c

Thanks,
Weikuan


> Thanks for the response.  I can well believe that the problem is some  
> error in the setup of my environment.  I just can't figure out what as  
> yet.  As requested, some more information is below on the two systems  
> 'koa' and 'jatoba'.
>
> >
>  > Since the MPD daemons seem to have been started properly, as your  
> ability
>  > to run non-MPI commands shows, I'm wondering if your $PATH may not  
> be set
>  > properly when compiling the cpi application.
>  >
>  > Can you just verify by 'echo'ing the $PATH before compiling? Can  
> you also
>  > provide us with other information about your environment?
>  >
>  > Also, when running MVAPICH2, can you try using the 'mpdtrace'  
> command to
>  > verify that MPD has properly started on the requested nodes?
>  >
>
>  For the MVAPICH (MPI-1) case:
>
> On system 'koa':
>
> [root at koa ib]# cat  .mv1        <-- using this file to set the PATH
> echo "setting up MVAPICH environment"
> export PATH=/usr/local/mvapich/bin:$PATH
> [root at koa ib]# . .mv1           <-- sourcing the .mv1 file
> setting up MVAPICH environment
> [root at koa ib]# echo $PATH       <-- PATH after setup
> /usr/local/mvapich/bin:/usr/kerberos/sbin:/sbin:/usr/kerberos/bin:/ 
> usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin
> [root at koa ib]# which mpicc      <-- showing where the compiler script  
> is located
> /usr/local/mvapich/bin/mpicc
> [root at koa ib]# mpd &            <-- starting mpd
> [1] 32649
> [root at koa ib]# mpdtrace         <-- with just mpd running on 'koa'
> mpdtrace: koa_53859:  lhs=koa_53859  rhs=koa_53859  rhs2=koa_53859  
> gen=1
> [root at koa ib]# mpdtrace         <-- after starting mpd on the 'jatoba'  
> system
> mpdtrace: koa_53859:  lhs=jatoba_44789  rhs=jatoba_44789  
>  rhs2=koa_53859 gen=1
> mpdtrace: jatoba_44789:  lhs=koa_53859  rhs=koa_53859  
>  rhs2=jatoba_44789 gen=1
> [root at koa ib]# /sbin/lsmod      <-- showing that the various ib  
> modules are loaded
> Module                  Size  Used by
> ib_sdp                 93448  0
> ib_cm                  33472  1 ib_sdp
> ib_ipoib               41480  0
> ib_sa                  14652  2 ib_sdp,ib_ipoib
> ib_uverbs              39760  0
> ib_umad                15024  0
> ib_mthca              120608  0
> ib_mad                 37816  4 ib_cm,ib_sa,ib_umad,ib_mthca
> ib_core                48640  8  
> ib_sdp,ib_cm,ib_ipoib,ib_sa,ib_uverbs,ib_umad,ib_mthca,ib_mad
>
>
> On system 'jatoba':
>
> [root at jatoba ib]# cat .mv1
> echo "setting up MVAPICH environment"
> export PATH=/usr/local/mvapich/bin:$PATH
> setting up MVAPICH environment
> [root at jatoba ib]# echo $PATH
> /usr/local/mvapich/bin:/usr/kerberos/sbin:/opt/gcc-3.3.3/bin:/usr/ 
> kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:/usr/local/ 
> bin:/usr/java/bin:/usr/ant/bin:/home/ib/bin
> [root at jatoba ib]# which mpicc
> /usr/local/mvapich/bin/mpicc
> [root at jatoba ib]# mpd -h koa -p 53859 &   <-- starting mpd with  
> parameters from 'koa'
> [1] 9442
> [root at jatoba ib]# mpdtrace                <-- after starting mpd
> mpdtrace: jatoba_44789:  lhs=koa_53859  rhs=koa_53859  
>  rhs2=jatoba_44789 gen=1
> mpdtrace: koa_53859:  lhs=jatoba_44789  rhs=jatoba_44789  
>  rhs2=koa_53859 gen=1
> [root at jatoba ib]# /sbin/lsmod             <-- showing that the various  
> ib modules are loaded
> Module                  Size  Used by
> ib_sdp                 93448  0
> ib_cm                  33344  1 ib_sdp
> ib_ipoib               41480  0
> ib_sa                  14524  2 ib_sdp,ib_ipoib
> ib_uverbs              39760  0
> ib_umad                15024  0
> ib_mthca              120608  0
> ib_mad                 37816  4 ib_cm,ib_sa,ib_umad,ib_mthca
> ib_core                48640  8  
> ib_sdp,ib_cm,ib_ipoib,ib_sa,ib_uverbs,ib_umad,ib_mthca,ib_mad
>
>
>
> For the MVAPICH2 case:
>
> On 'koa':
>
> [root at koa ib]# cat .mv2
> echo "setting up MVAPICH2 environment"
> export PATH=/tmp/mvapich2/bin:$PATH
> [root at koa ib]# . .mv2
> setting up MVAPICH2 environment
> [root at koa ib]# echo $PATH
> /tmp/mvapich2/bin:/usr/kerberos/sbin:/sbin:/usr/kerberos/bin:/usr/ 
> local/bin:/bin:/usr/bin:/usr/X11R6/bin
> [root at koa ib]# which mpicc
> /tmp/mvapich2/bin/mpicc
> [root at koa ib]# mpdboot -n 2 -f /root/mpd.hosts
> [root at koa ib]# mpdtrace -l
> koa.az05.bull.com_53727
> jatoba.az05.bull.com_54528
>
> On 'jatoba':
>
> [jatoba] (ib) ib> su
> Password:
> [root at jatoba ib]# cat .mv2
> echo "setting up MPVAPICH2 environment"
> export PATH=/tmp/mvapich2/bin:$PATH
> [root at jatoba ib]# . .mv2
> setting up MPVAPICH2 environment
> [root at jatoba ib]# echo $PATH
> /tmp/mvapich2/bin:/usr/kerberos/sbin:/opt/gcc-3.3.3/bin:/usr/kerberos/ 
> bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:/usr/local/bin:/usr/ 
> java/bin:/usr/ant/bin:/home/ib/bin
> [root at jatoba ib]# which mpicc
> /tmp/mvapich2/bin/mpicc
> [root at jatoba ib]# mpdtrace -l
> jatoba.az05.bull.com_54528
> koa.az05.bull.com_53727
>
>
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit  
> http://openib.org/mailman/listinfo/openib-general
--
Weikuan Yu, Computer Science, OSU
http://www.cse.ohio-state.edu/~yuw




More information about the general mailing list