[ewg] Announcing the release of MVAPICH 1.0

Thu Feb 28 21:17:48 PST 2008

The MVAPICH team is pleased to announce the availability of MVAPICH
1.0 with the following NEW features:

- New Scalable and robust job startup
   - Enhanced and robust mpirun_rsh framework to provide scalable
     launching on multi-thousand core clusters
     - Running time of `MPI Hello World' program on 1K cores is around
       4 sec and on 32K cores is around 80 sec
     - Available for OpenFabrics/Gen2, OpenFabrics/Gen2-UD and
       QLogic InfiniPath devices
     - Performance graph at:
       http://mvapich.cse.ohio-state.edu/performance/startup.shtml
   - Enhanced support for SLURM
     - Available for OpenFabrics/Gen2, OpenFabrics/Gen2-UD and
       QLogic InfiniPath devices

- New OpenFabrics Gen2 Unreliable-Datagram (UD)-based design
  for large-scale InfiniBand clusters (multi-thousand cores)
   - delivers performance and scalability with constant
     memory footprint for communication contexts
      - Only 40MB per process even with 16K processes connected to
        each other
      - Performance graph at:
        http://mvapich.cse.ohio-state.edu/performance/mvapich/ud_memory.shtml
   - zero-copy protocol for large data transfer
   - shared memory communication between cores within a node
   - multi-core optimized collectives
     (MPI_Bcast, MPI_Barrier, MPI_Reduce and MPI_Allreduce)
   - enhanced MPI_Allgather collective

- New features for OpenFabrics Gen2-IB interface
   - enhanced coalescing support with varying degree of coalescing
   - support for ConnectX adapter
   - support for asynchronous progress at both sender and receiver
     to overlap computation and communication
   - multi-core optimized collectives (MPI_Bcast)
   - tuned collectives (MPI_Allgather, MPI_Bcast)
     based on network adapter characteristics
      - Performance graph at:
        http://mvapich.cse.ohio-state.edu/performance/collective.shtml
   - network-level fault tolerance with Automatic Path Migration (APM)
     for tolerating intermittent network failures over InfiniBand.

- New Support for QLogic InfiniPath adapters
   - high-performance point-to-point communication
   - optimized collectives (MPI_Bcast and MPI_Barrier) with k-nomial
     algorithms while exploiting multi-core architecture

- Optimized and high-performance ADIO driver for Lustre
   - This MPI-IO support is a contribution from Future Technologies Group,
     Oak Ridge National Laboratory.
     (http://ft.ornl.gov/doku/doku.php?id=ft:pio:start)
      - Performance graph at:
        http://mvapich.cse.ohio-state.edu/performance/mvapich/romio.shtml

- Flexible user defined processor affinity for better resource utilization
  on multi-core systems
   - flexible process bindings to cores
   - allows memory-intensive applications to run with a subset of cores
     on each chip for better performance

More details on all features and supported platforms can be obtained
by visiting the following URL:

http://mvapich.cse.ohio-state.edu/overview/mvapich/features.shtml

MVAPICH 1.0 continues to deliver excellent performance. Sample
performance numbers include:

  - with OpenFabrics/Gen2 on EM64T quad-core with PCIe and ConnectX-DDR:
        - 1.51 microsec one-way latency (4 bytes)
        - 1404 MB/sec unidirectional bandwidth
        - 2713 MB/sec bidirectional bandwidth

  - with PSM on Opteron with Hypertransport and QLogic-SDR:
        - 1.25 microsec one-way latency (4 bytes)
        - 953 MB/sec unidirectional bandwidth
        - 1891 MB/sec bidirectional bandwidth

Performance numbers for all other platforms, system configurations and
operations can be viewed by visiting `Performance' section of the
project's web page.

For downloading MVAPICH 1.0, associated user guide and
accessing the anonymous SVN, please visit the following URL:

http://mvapich.cse.ohio-state.edu

All feedbacks, including bug reports and hints for performance tuning,
are welcome. Please post it to the mvapich-discuss mailing list.

Thanks,

The MVAPICH Team

======================================================================
MVAPICH/MVAPICH2 project is currently supported with funding from
U.S. National Science Foundation, U.S. DOE Office of Science,
Mellanox, Intel, Cisco Systems, QLogic, Sun Microsystems and Linux
Networx; and with equipment support from Advanced Clustering, AMD,
Appro, Chelsio, Dell, Fujitsu, Fulcrum, IBM, Intel, Mellanox,
Microway, NetEffect, QLogic and Sun Microsystems. Other technology
partner includes Etnus.
======================================================================