[openib-general] Problems running MPI jobs with MVAPICH and MVAPICH2
Weikuan Yu
yuw at cse.ohio-state.edu
Wed Mar 22 15:00:27 PST 2006
Hi, Don,
Thanks again for the detail report. Guess we are a bit puzzled here.
Just a couple of questions to check with you on these systems?
a) Did you have to build the mvapich libraries separately on these
machines? Related to this, are the directories: /usr/local,
/tmp/mvapich, nfs exported for sharing across different machines? If
you had to build the libraries separately, please provide the other set
of info from `jatoba'.
b) Could you provide some additional specifications of these two
machines? Kernel versions, linux distribution, HCA firmware and
versions, gen2 kernel and userspace versions? We could have asked you
earlier. But just came to think these might be relevant...
Looking ahead. We may need access into your systems to give it a shot,
if it is ever possible...
Weikuan
On Mar 22, 2006, at 5:06 PM, Don.Albert at Bull.com wrote:
>
> Weikuan,
>
> >
> > However, let us look into this slightly differently. Does the
> problem
> > happen to 2 processes on the same node too? If so, we can focus on
> such
>
> No, when I run on only one machine (either one) I can execute multiple
> copies of the jobs:
>
> [root at koa cpi]# mpirun_mpd -np 2 ./cpi
> pi is approximately 3.1416009869231241, Error is 0.0000083333333309
> wall clock time = 0.000088
> Process 0 on koa.az05.bull.com
> Process 1 on koa.az05.bull.com
> [root at koa cpi]# mpirun_mpd -np 4 ./cpi
> Process 0 on koa.az05.bull.com
> Process 1 on koa.az05.bull.com
> Process 2 on koa.az05.bull.com
> Process 3 on koa.az05.bull.com
> pi is approximately 3.1416009869231249, Error is 0.0000083333333318
> wall clock time = 0.000165
>
> > a case and get it cleared up first. Let us say we use `koa'. Could
> you
> > provide the following information to us? We can do that for
> > mvapich-gen2 first, assuming the cause of the problem will be
> similar
> > for the other case.
> >
> > a) echo $LD_LIBRARY_PATH
>
> $LD_LIBRARY_PATH is null
>
> > b) The modified build script. The template you used for
> mvapich-gen2
> > should be make.mvapich.gen2.
> > c) build/configure log file: config.log, make.log, etc.
>
> These files are attached below.
>
> > d) The output of this command:
> > # mpicc -show -o cpi example/basic/cpi.c
>
> [root at koa mvapich-gen2]# mpicc -show -o cpi examples/basic/cpi.c
> gcc -DUSE_STDARG -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_UNISTD_H=1
> -DHAVE_STDARG_H=1 -DUSE_STDARG=1 -DMALLOC_RET_VOID=1 -c
> examples/basic/cpi.c -I/usr/local/mvapich/include
> gcc -DUSE_STDARG -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_UNISTD_H=1
> -DHAVE_STDARG_H=1 -DUSE_STDARG=1 -DMALLOC_RET_VOID=1
> -L/usr/local/mvapich/lib cpi.o -o cpi -lmpich -L/usr/local/lib
> -Wl,-rpath=/usr/local/lib -libverbs -lpthread
>
>
> Also here is the output of "ldd" on the executable file:
>
> [root at koa mvapich-gen2]# ldd /home/ib/tests/mpi/cpi/cpi
> libibverbs.so.1 => /usr/local/lib/libibverbs.so.1
> (0x00002aaaaaaac000)
> libpthread.so.0 => /lib64/tls/libpthread.so.0
> (0x000000352cb00000)
> libc.so.6 => /lib64/tls/libc.so.6 (0x000000352bc00000)
> libsysfs.so.1 => /usr/lib64/libsysfs.so.1 (0x00000038b1900000)
> libdl.so.2 => /lib64/libdl.so.2 (0x000000352bf00000)
> /lib64/ld-linux-x86-64.so.2 (0x000000352ba00000)
>
>
> -Don Albert-
>
>
> <make.mvapich.gen2><config.status><config-mine.log><install-
> mine.log><config.log><make-mine.log>
--
Weikuan Yu, Computer Science, OSU
http://www.cse.ohio-state.edu/~yuw
More information about the general
mailing list