[openib-general] Running MVAPICH2 with SLURM Process Manager

Don.Dhondt at Bull.com Don.Dhondt at Bull.com
Thu May 25 08:55:00 PDT 2006


Thanks for the reply Dr. Panda.
I did notice the formal release on the openib forum and we will move to 
that release today.

We made a couple attempts at rebuilding mvapich2 and our symptoms changed. 
Maybe
for the better, but still not good results. In our last attempt we 
disabled the compile option
"USE_MPD_RING"  (HAVE_MPD_RING=""). It seemed to get further but then 
failed with a 
"cannot create cq" error message. We are obviously failing now in the 
infiniband code.
The perplexing thing is that the applications work when run with mpiexec 
(outside of slurm)
and have the MPD deamons running.

The latest suggestion from LLNL is to make sure we have unlimited max 
locked
memory for our MPI tasks with:

 srun sh -c 'ulimit -l'

Below are the latest "traces" of the error.

> (2)  We tried building without USE_MPD_RING and the test now fails in 
> MPI_Init:
> -----------------
> 1: slurmd[molson]: task_pre_launch: 3.0, task 1
> 1: In: PMI_Init
> 1: In: PMI_Get_rank
> 1: In: PMI_Get_size
> 1: In: PMI_Get_appnum
> 1: In: PMI_Get_id_length_max
> 1: In: PMI_Get_id
> 1: In: PMI_KVS_Get_name_length_max
> 1: In: PMI_KVS_Get_my_name
> 1: cannot create cq
> 1: Fail to init hca
> 1: Fatal error in MPI_Init: Other MPI error, error stack:
> 1: MPIR_Init_thread(225): Initialization failed
> 1: MPID_Init(81)........: channel initialization failed
> -------------
> We checked the Troubleshooting section of the mvapich2 document and 
> followed the suggestions for these errors, but it did not help.






Dhabaleswar Panda <panda at cse.ohio-state.edu> 
05/24/2006 08:24 PM
Please respond to
panda at cse.ohio-state.edu


To
Don.Dhondt at Bull.com
cc
openib-general at openib.org, mvapich-discuss at cse.ohio-state.edu
Subject
Re: [openib-general] Running MVAPICH2 with SLURM Process Manager






Hi Don, 

> We are running  mvapich2-0.9.3-RC0 with  OFED1.0 RC4 and have had good 
> results.

Thanks for doing this testing. Glad to know that it works with OFED1.0
RC4. Please note that we made a formal release of MVAPICH2-0.9.3
during the weekend.

> We would like to use the SLURM resource manager with this combination 
> rather than MPD
> but it does not appear to be one of the choices avaliable. Does anyone 
> have any
> experience in this area?
> 
>     ./configure  --prefix=${PREFIX} ${MULTI_THREAD} \
>     --with-device=osu_ch3:mrail --with-rdma=gen2 --with-pm=mpd \
>     --disable-romio --without-mpe 2>&1 |tee config-mine.log
> 
>    --with-pm=mpd
>    We would have liked to have seen an option for slurm.

Thanks for this suggestion. We have not tested MVAPICH2 with SLURM. To
the best of our knowledge, SLURM works with MPICH2/MPD.  Thus, there
should not be a problem for MVAPICH2 to work with SLURM. (I believe
some of the MVAPICH/MVAPICH2 users do so.) We are taking a look at it
and will get back to you.

Best Regards, 

DK

> Regards,
> Donald Dhondt
> GCOS 8 Communications Solutions Project Manager
> Bull HN Information Systems Inc.
> 13430 N. Black Canyon Hwy., Phoenix, AZ  85029
> Work (602) 862-5245     Fax (602) 862-4290



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060525/23572c3f/attachment.html>


More information about the general mailing list