[openib-general] mvapich2 pmi scalability problems

Matthew Koop koop at cse.ohio-state.edu
Fri Jul 21 11:51:31 PDT 2006


Don,

Are you using the USE_MPD_RING flag when compiling? If not, can you give
that a try? It should very significantly decrease the number of PMI calls
that are made.

Thanks,

Matthew Koop


On Fri, 21 Jul 2006 Don.Dhondt at Bull.com wrote:

> We have been working with LLNL trying to debug a problem using slurm as
> our resource manager,
> mvapich2 as our MPI choice and OFED 1.0 as our infiniband stack. The
> mvapich2 version is mvapich2-0.9.3.
> The problem arises when we try to scale a simple mpi job. We can not go
> much above 128 tasks
> before we start timing out socket connections on the PMI exchanges.
> Can anyone at OSU comment?
>
> Processes  PMI_KVS_Put   PMI_KVS_Get    PMI_KVS_Commit  Num Procs ratio
> Calls ratio
> n32           1024          1248              1024           1   1
> n64           4096          4544              4096           2   4
> n96           9216          9888              9216           3   8
> n128         16384         17280             16384           4  16
>
> Comment from LLNL:
> ------------------
> That is interesting! The ratio for MPICH2 is constant, so clearly
> MVAPICH2 is doing something unusual (and unexpected, to me anyway).
>
> What will MVAPCH2 do with really large parallel jobs? We regularly
> run jobs with thousands to tens of thousands of tasks. If you have
> located an MVAPICH2 expert, this would definitely be worth asking
> about. It's use of PMI appears to be non-scalable.
> ----------------------
>
> Any help is appreciated.
>
> Regards,
> Don Dhondt
> GCOS 8 Communications Solutions Project Manager
> Bull HN Information Systems Inc.
> 13430 N. Black Canyon Hwy., Phoenix, AZ  85029
> Work (602) 862-5245     Fax (602) 862-4290







More information about the general mailing list