[openib-general] Announcing the release of MVAPICH 0.9.6 (MPI-1 over InfiniBand and other RDMA Interconnects)

Dhabaleswar Panda panda at cse.ohio-state.edu
Tue Dec 6 21:21:37 PST 2005


As MVAPICH software keeps on empowering several clusters in the TOP500
list (including the #5 ranked Sandia Thunderbird), the MVAPICH team is
aiming to push the performance and scalability of InfiniBand clusters
to the next level!! The team is pleased to announce the release of
MVAPICH 0.9.6 for the following platforms, OS, compilers, and
InfiniBand adapters:

  - Platforms: EM64T, Opteron, IA-32 and Mac G5 
  - Operating Systems: Linux, Solaris and Mac OSX 
  - Compilers: gcc, intel, pathscale and pgi 
  - InfiniBand Adapters: Mellanox adapters with PCI-X 
    and PCI-Express (SDR and DDR with mem-full and mem-free cards) 

In addition to delivering high performance with VAPI interface,
MVAPICH 0.9.6 also provides uDAPL support for portability across
networks and platforms with highest performance. The uDAPL interface
of this release has been tested with InfiniBand (OpenIB SCM/Gen2 uDAPL
and Solaris IBTL/uDAPL) and Myrinet (DAPL-GM beta).

Starting with this release, MVAPICH enables InfiniBand support for
Solaris environment through uDAPL support.

MVAPICH 0.9.6 is being distributed as a single integrated package
(with MPICH 1.2.7 and MVICH).  It is available under BSD license.

This release has the following features:

      - Designs for scaling InfiniBand clusters to multi-thousand 
        nodes with highest performance and reduced memory usage
      - Optimized implementation of Rendezvous protocol (RDMA Read
        and RDMA Write) for better computation-communication overlap
        and progress
      - Two modes of communication progress (polling and blocking)
      - Resource-aware registration cache
      - Optimized intra-node communication for Bus-based and NUMA-based
        systems with processor affinity
      - High performance and scalable collective communication support
        (Broadcast support using IB hardware multicast mechanism;
         RDMA-based barrier, all-to-all and all-gather)
      - Multi-rail communication support (multiple ports per adapter 
        and multiple adapters)
      - Shared library support
      - ROMIO support
      - uDAPL support for portability across networks and OS 
        (tested for InfiniBand on Linux and Solaris; and Myrinet)
      - Scalable job start-up with MPD
      - TotalView debugger support
      - Optimized and tuned for the above platforms and different 
        network interfaces (PCI-X and PCI-Express with SDR and DDR)
      - Support for multiple compilers (gcc, icc, pathscale and pgi) 
      - Single code base for all of the above platforms and OS
      - Integrated and easy-to-use build script for installing the
        code on various platforms, OS, compilers, Devices, and
        InfiniBand adapters
      - Incorporates a set of runtime and compiler time tunable
        parameters for convenient tuning on large-scale clusters 

Other features of this release include:

- Excellent performance: Sample performance numbers include:
 
  EM64T, PCI-Ex, IBA-DDR: 
     - 3.09 microsec one-way latency (4 bytes)
     - 1475 MB/sec unidirectional bandwidth 
     - 2661 MB/sec bidirectional bandwidth 

  EM64T, PCI-Ex, IBA-SDR: 
     - 3.52 microsec one-way latency (4 bytes)
     - 968 MB/sec unidirectional bandwidth with single-rail and
       1497 MB/sec with multi-rail
     - 1781 MB/sec bidirectional bandwidth with single-rail and 
       2721 MB/sec with multi-rail

  Opteron, PCI-Ex, IBA-SDR: 
     - 3.42 microsec one-way latency (4 bytes)
     - 968 MB/sec unidirectional bandwidth with single-rail
     - 1865 MB/sec bidirectional bandwidth with single-rail 

  Solaris uDAPL/IBTL on Opteron, PCI-X, IBA-SDR:
     - 5.38 microsec one-way latency (4 bytes)
     - 651 MB/sec unidirectional bandwidth
     - 808 MB/sec bidirectional bandwidth

  OpenIB/Gen2 uDAPL on Opteron, PCI-Ex, IBA-SDR:
     - 3.39 microsec one-way latency (4 bytes)
     - 968 MB/sec unidirectional bandwidth
     - 1890 MB/sec bidirectional bandwidth

  OpenIB/Gen2 uDAPL on EM64T, PCI-Ex, IBA-SDR:
     - 3.43 microsec one-way latency (4 bytes)
     - 968 MB/sec unidirectional bandwidth
     - 1912 MB/sec bidirectional bandwidth

  Performance numbers for all other platforms, system configurations
  and operations can be viewed by visiting `Performance Results'
  section of the project's web page.

- A set of benchmarks to evaluate point-to-point and collective
  operations

- An enhanced and detailed `User Guide' to assist users: 

       - to install this package on different platforms
            with both interfaces (VAPI and uDAPL) and different options

       - to vary different parameters of the MPI installation to 
            extract maximum performance and achieve scalability,
            especially on large-scale systems.

You are welcome to download the MVAPICH 0.9.6 package and access
relevant information from the following URL:

http://nowlab.cse.ohio-state.edu/projects/mpi-iba/

A successive version with support for OpenIB/Gen2 will be available
soon.

All feedbacks, including bug reports and hints for performance tuning,
are welcome. Please send an e-mail to mvapich-help at cse.ohio-state.edu.

Thanks, 

MVAPICH Team at OSU/NBCL 

----------

PS: If you would like to be removed from this mailing list, please end
an e-mail to mvapich_request at cse.ohio-state.edu.


======================================================================
MVAPICH/MVAPICH2 project is currently supported with funding from
U.S. National Science Foundation, U.S. DOE Office of Science,
Mellanox, Intel, Cisco Systems, Sun Microsystems, and Linux Networx;
and with equipment support from AMD, Apple, Appro, IBM, Intel,
Mellanox, Microway, PathScale, SilverStorm and Sun Microsystems. Other
technology partner includes Etnus.
======================================================================




More information about the general mailing list