<br><font size=2 face="sans-serif">We have been working with LLNL trying
to debug a problem using slurm as our resource manager,</font>
<br><font size=2 face="sans-serif">mvapich2 as our MPI choice and OFED
1.0 as our infiniband stack. The mvapich2 version is mvapich2-0.9.3.</font>
<br><font size=2 face="sans-serif">The problem arises when we try to scale
a simple mpi job. We can not go much above 128 tasks</font>
<br><font size=2 face="sans-serif">before we start timing out socket connections
on the PMI exchanges.</font>
<br><font size=2 face="sans-serif">Can anyone at OSU comment?</font>
<br><font size=1 face="Courier New"><br>
Processes PMI_KVS_Put PMI_KVS_Get PMI_KVS_Commit
Num Procs ratio Calls ratio<br>
n32 1024
1248 1024
1
1<br>
n64 4096
4544 4096
2
4<br>
n96 9216
9888 9216
3
8<br>
n128 16384 17280
16384
4 16<br>
</font>
<br><font size=2 face="sans-serif">Comment from LLNL:<br>
------------------</font><font size=3> <br>
That is interesting! The ratio for MPICH2 is constant, so clearly <br>
MVAPICH2 is doing something unusual (and unexpected, to me anyway). <br>
<br>
What will MVAPCH2 do with really large parallel jobs? We regularly <br>
run jobs with thousands to tens of thousands of tasks. If you have <br>
located an MVAPICH2 expert, this would definitely be worth asking <br>
about. It's use of PMI appears to be non-scalable. </font><font size=2 face="sans-serif"><br>
----------------------</font><font size=3> </font>
<br>
<br><font size=2 face="sans-serif">Any help is appreciated.</font>
<br>
<br><font size=2 face="sans-serif">Regards,</font>
<br><font size=2 face="sans-serif">Don Dhondt<br>
GCOS 8 Communications Solutions Project Manager<br>
Bull HN Information Systems Inc.<br>
13430 N. Black Canyon Hwy., Phoenix, AZ 85029<br>
Work (602) 862-5245 Fax (602) 862-4290</font>