[openib-general] Problems running MPI jobs with MVAPICH and MVAPICH2

Weikuan Yu yuw at cse.ohio-state.edu
Wed Mar 22 15:00:27 PST 2006


Hi, Don,

Thanks again for the detail report. Guess we are a bit puzzled here.  
Just a couple of questions to check with you on these systems?

a) Did you have to build the mvapich libraries separately on these  
machines? Related to this, are the directories: /usr/local,  
/tmp/mvapich, nfs exported for sharing across different machines? If  
you had to build the libraries separately, please provide the other set  
of info from `jatoba'.
b) Could you provide some additional specifications of these two  
machines? Kernel versions, linux distribution, HCA firmware and  
versions, gen2 kernel and userspace versions? We could have asked you  
earlier. But just came to think these might be relevant...

Looking ahead. We may need access into your systems to give it a shot,  
if it is ever possible...

Weikuan


On Mar 22, 2006, at 5:06 PM, Don.Albert at Bull.com wrote:

>
> Weikuan,
>
> >
>  > However, let us look into this slightly differently. Does the  
> problem  
>  > happen to 2 processes on the same node too? If so, we can focus on  
> such
>
> No, when I run on only one machine (either one) I can execute multiple  
> copies of the jobs:
>
> [root at koa cpi]# mpirun_mpd -np 2 ./cpi
> pi is approximately 3.1416009869231241, Error is 0.0000083333333309
> wall clock time = 0.000088
> Process 0 on koa.az05.bull.com
> Process 1 on koa.az05.bull.com
> [root at koa cpi]# mpirun_mpd -np 4 ./cpi
> Process 0 on koa.az05.bull.com
> Process 1 on koa.az05.bull.com
> Process 2 on koa.az05.bull.com
> Process 3 on koa.az05.bull.com
> pi is approximately 3.1416009869231249, Error is 0.0000083333333318
> wall clock time = 0.000165
>  
>  > a case and get it cleared up first. Let us say we use `koa'. Could  
> you  
>  > provide the following information to us? We can do that for  
>  > mvapich-gen2 first, assuming the cause of the problem will be  
> similar  
>  > for the other case.
>  >
>  > a) echo $LD_LIBRARY_PATH
>
> $LD_LIBRARY_PATH is null
>
>  > b) The modified build script. The template you used for  
> mvapich-gen2  
>  > should be make.mvapich.gen2.
>  > c) build/configure log file: config.log, make.log, etc.
>
> These files are attached below.
>
>  > d) The output of this command:
>  > # mpicc -show -o cpi example/basic/cpi.c
>
> [root at koa mvapich-gen2]# mpicc -show -o cpi examples/basic/cpi.c
> gcc -DUSE_STDARG -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_UNISTD_H=1  
> -DHAVE_STDARG_H=1 -DUSE_STDARG=1 -DMALLOC_RET_VOID=1 -c  
> examples/basic/cpi.c -I/usr/local/mvapich/include
> gcc -DUSE_STDARG -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_UNISTD_H=1  
> -DHAVE_STDARG_H=1 -DUSE_STDARG=1 -DMALLOC_RET_VOID=1  
> -L/usr/local/mvapich/lib cpi.o -o cpi -lmpich -L/usr/local/lib  
> -Wl,-rpath=/usr/local/lib -libverbs -lpthread
>
>
> Also here is the output of "ldd" on the executable file:
>
> [root at koa mvapich-gen2]# ldd /home/ib/tests/mpi/cpi/cpi
>         libibverbs.so.1 => /usr/local/lib/libibverbs.so.1  
> (0x00002aaaaaaac000)
>         libpthread.so.0 => /lib64/tls/libpthread.so.0  
> (0x000000352cb00000)
>         libc.so.6 => /lib64/tls/libc.so.6 (0x000000352bc00000)
>         libsysfs.so.1 => /usr/lib64/libsysfs.so.1 (0x00000038b1900000)
>         libdl.so.2 => /lib64/libdl.so.2 (0x000000352bf00000)
>         /lib64/ld-linux-x86-64.so.2 (0x000000352ba00000)
>
>
>         -Don Albert-
>
>
> <make.mvapich.gen2><config.status><config-mine.log><install- 
> mine.log><config.log><make-mine.log>
--
Weikuan Yu, Computer Science, OSU
http://www.cse.ohio-state.edu/~yuw




More information about the general mailing list