[openib-general] Problems running MPI jobs with MVAPICH
Don.Albert at Bull.com
Don.Albert at Bull.com
Thu Apr 6 17:02:44 PDT 2006
Weikuan
I previously reported that I was having problems running any MPI jobs
between a pair of EM64T machines with RHEL4, Update 3 with the OpenIB
modules, (kernel versions 2.6.9-34.ELsmp) and the "mvapich-gen2" code
from the OpenIB svn tree. I was having two problems:
When I tried to run from user mode, I would get segmentation faults
When I ran from root, the jobs would fail with the following message:
"cpi: pmgr_client_mpd.c:254: mpd_exchange_info: Assertion `len_remote ==
len_local' failed. ".
The first problem turned out to be a memory problem; I had to increase
the size of the max locked-in-memory address space (memlock) in the user
limits.
The second problem seemed to be more related to process management than to
MPI itself. I remembered that when I modified the "make.mvapich.gen2"
build script, there was a parameter for MPD:
# Whether to use an optimized queue pair exchange scheme. This is not
# checked for a setting in in the script. It must be set here
explicitly.
# Supported: "-DUSE_MPD_RING", "-DUSE_MPD_BASIC" and "" (to disable)
HAVE_MPD_RING=""
Because I wanted to use MPD to launch jobs, I set
HAVE_MPD_RING="-DUSE_MPD_RING" in the build script.
I went back and set the parameter to HAVE_MPD_RING="" to disable it, and
rebuilt, which meant that MPD was not installed. Using "mpirun_rsh" I am
now able to run the MPI jobs, including "cpi", "mping" and other
benchmark tests.
There seems to be a problem with "USE_MPD_RING". Have you seen this
before? Should I try with "USE_MPD_BASIC" instead?
-Don Albert-
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060406/3ead1703/attachment.html>
More information about the general
mailing list