[openib-general] Problems running MPI jobs with MVAPICH

Don.Albert at Bull.com Don.Albert at Bull.com
Thu Apr 6 17:02:44 PDT 2006


Weikuan

I previously reported that I was having problems running any MPI jobs 
between a pair of EM64T machines with RHEL4, Update 3 with the OpenIB 
modules,  (kernel versions 2.6.9-34.ELsmp) and the "mvapich-gen2" code 
from the OpenIB svn tree.     I was having two problems:

When I tried to run from user mode,  I would get segmentation faults

When I ran from root,  the jobs would fail with the following message: 
"cpi: pmgr_client_mpd.c:254: mpd_exchange_info: Assertion `len_remote == 
len_local' failed. ".

The first problem turned out to be a memory problem;  I had to increase 
the size of the max locked-in-memory address space (memlock) in the user 
limits.

The second problem seemed to be more related to process management than to 
MPI itself.   I remembered that when I modified the "make.mvapich.gen2" 
build script,  there was a parameter for MPD:

  # Whether to use an optimized queue pair exchange scheme.  This is not
  # checked for a setting in in the script.  It must be set here 
explicitly.
  # Supported: "-DUSE_MPD_RING", "-DUSE_MPD_BASIC" and "" (to disable)
  HAVE_MPD_RING=""

Because I wanted to use MPD to launch jobs,  I set 
HAVE_MPD_RING="-DUSE_MPD_RING"  in the build script.

I went back and set the parameter to HAVE_MPD_RING="" to disable it, and 
rebuilt, which meant that MPD was not installed.   Using "mpirun_rsh" I am 
now able to run the MPI jobs,  including "cpi", "mping" and other 
benchmark tests.

There seems to be a problem with "USE_MPD_RING".    Have you seen this 
before?   Should I try with "USE_MPD_BASIC" instead?

        -Don Albert-
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060406/3ead1703/attachment.html>


More information about the general mailing list