[openib-general] mvapich2 pmi scalability problems

Matthew Koop koop at cse.ohio-state.edu
Mon Jul 24 13:40:20 PDT 2006


We've looked at the issue a bit more and discussed off-list, and the issue
has been resolved through the attached patch. For best performance, the
upcoming release of SLURM (1.1.5) with better optimization of PMI should
also be used.

Thanks,

Matthew Koop
-
Network-Based Computing Lab
Ohio State University


On Fri, 21 Jul 2006 Don.Dhondt at Bull.com wrote:

> Matthew,
>
> We build mvapich2 using the make.mvapich2.gen2 script.
> Within that script file is the fillowing:
> # Whether to use an optimized queue pair exchange scheme.  This is not
> # checked for a setting in in the script.  It must be set here explicitly.
> # Supported: "-DUSE_MPD_RING" and "" (to disable)
> if [ $ARCH = "_PPC64_" ]; then
>         HAVE_MPD_RING=""
> else
>         HAVE_MPD_RING="-DUSE_MPD_RING"
> fi
>
> Since we are compiling for ia64 our assumption is it compiled with
> HAVE_MPD_RING="-DUSE_MPD_RING". Is this correct?
> Also, we are not using mpd to run start the jobs. Since we are
> using slurm as the resource manager the jobs are started with
> srun. Does MPD_RING on apply if using MDP?
>
> -Don
>
>
>
>
> Matthew Koop <koop at cse.ohio-state.edu>
> 07/21/2006 11:51 AM
>
> To
> Don.Dhondt at Bull.com
> cc
> openib-general at openib.org
> Subject
> Re: [openib-general] mvapich2 pmi scalability problems
>
>
>
>
>
>
> Don,
>
> Are you using the USE_MPD_RING flag when compiling? If not, can you give
> that a try? It should very significantly decrease the number of PMI calls
> that are made.
>
> Thanks,
>
> Matthew Koop
>
>
>
>
>
>
>
-------------- next part --------------
Index: src/mpid/osu_ch3/channels/mrail/include/mpidi_ch3_pre.h
===================================================================
--- src/mpid/osu_ch3/channels/mrail/include/mpidi_ch3_pre.h     (revision 377)
+++ src/mpid/osu_ch3/channels/mrail/include/mpidi_ch3_pre.h     (working copy)
@@ -97,6 +97,7 @@
     struct MPID_Request * send_active;
     struct MPID_Request * recv_active;
     int local_nodes;
+    int hostid;
 } MPIDI_CH3I_SMP_VC;
 #endif
 
Index: src/mpid/osu_ch3/channels/mrail/src/rdma/ch3_smp_progress.c
===================================================================
--- src/mpid/osu_ch3/channels/mrail/src/rdma/ch3_smp_progress.c (revision 377)
+++ src/mpid/osu_ch3/channels/mrail/src/rdma/ch3_smp_progress.c (working copy)
@@ -1069,6 +1069,16 @@
 
     /** exchange address hostid using PMI interface **/
     if (pg_size > 1) {
+#ifdef USE_MPD_RING
+        for(i = 0; i < pg_size; i++) {
+            MPIDI_PG_Get_vc(pg, i, &vc);
+            if(i == pg_rank) {
+                hostnames_j[i] = hostid;
+            } else {
+                hostnames_j[i] = vc->smp.hostid;
+            }
+        }
+#else
         char *key;
         char *val;
 
@@ -1167,8 +1177,8 @@
                                      mpi_errno);
             return mpi_errno;
         }
+#endif /* end !MPD_RING */
 
-
     }
     /** end of exchange address **/
 
Index: src/mpid/osu_ch3/channels/mrail/src/gen2/rdma_iba_priv.c
===================================================================
--- src/mpid/osu_ch3/channels/mrail/src/gen2/rdma_iba_priv.c    (revision 377)
+++ src/mpid/osu_ch3/channels/mrail/src/gen2/rdma_iba_priv.c    (working copy)
@@ -1209,6 +1209,10 @@
                     rdma_iba_addr_table.lid[i][0],
                     local_addr_len, QPLEN_XDR);
 
+#ifdef _SMP_
+        vc->smp.hostid = rdma_iba_addr_table.hostid[i][0];
+#endif
+
         /* Get the qp, key and buffer for this process */
         temp_ptr = alladdr_inv + pg_rank * QPLEN_XDR;


More information about the general mailing list