[openib-general] Announcing the release of MVAPICH2 0.9.6 with on-demand connection management, multi-core optimized shared memory communication and memory hook support

Sun Oct 22 21:10:49 PDT 2006

The MVAPICH team is pleased to announce the availability of MVAPICH2
0.9.6 with the following NEW features:

 - On-demand connection management using native InfiniBand
   Unreliable Datagram (UD) support. This feature enables InfiniBand
   connections to be setup dynamically and has `near constant'
   memory usage with increasing number of processes.

   This feature together with the Shared Receive Queue (SRQ) feature
   (available since MVAPICH2 0.9.5) enhances the scalability
   of MVAPICH2 on multi-thousand node clusters.

   Performance of applications and memory scalability using 
   on-demand connection management can be found here:

   http://nowlab.cse.ohio-state.edu/projects/mpi-iba/perf-apps.html 

 - Multi-core optimized and scalable intra-node (intra-CMP and inter-CMP) 
   shared-memory communication for the emerging multi-core platforms. 
   (Available for Gen2, uDAPL (including Solaris) and VAPI interfaces) 

   Performance benefits of multi-core optimized support can be 
   found here:

   http://nowlab.cse.ohio-state.edu/projects/mpi-iba/perf-smp.html

 - Memory Hook support provided by integration with ptmalloc2 library. 
   This provides safe release of memory to the Operating System and 
   is expected to benefit the memory usage of applications that 
   heavily use malloc and free operations

 - Auto-detection of Architecture and InfiniBand adapters (at run-time)
   and associated optimizations

More details on all features and supported platforms can be obtained
by visiting the following URL:

http://nowlab.cse.ohio-state.edu/projects/mpi-iba/mvapich2_features.html

MVAPICH2 0.9.6 continues to deliver excellent performance.  Sample
performance numbers include:

  - OpenIB/Gen2 on EM64T with PCI-Ex and IBA-DDR:
      Two-sided operations: 
        - 3.25 microsec one-way latency (4 bytes)
        - 1411 MB/sec unidirectional bandwidth 
        - 2229 MB/sec bidirectional bandwidth  

      One-sided operations:
        - 6.30 microsec Put latency 
        - 1406 MB/sec unidirectional Put bandwidth 
        - 2229 MB/sec bidirectional Put bandwidth 

  - OpenIB/Gen2 on EM64T with PCI-Ex and IBA-DDR (Dual-rail):
      Two-sided operations:
        - 3.17 microsec one-way latency (4 bytes)
        - 2230 MB/sec unidirectional bandwidth
        - 2776 MB/sec bidirectional bandwidth

      One-sided operations:
        - 6.14 microsec Put latency
        - 2390 MB/sec unidirectional Put bandwidth
        - 2776 MB/sec bidirectional Put bandwidth

  - OpenIB/Gen2 on Opteron with PCI-Ex and IBA-DDR: 
      Two-sided operations:
        - 2.79 microsec one-way latency (4 bytes) 
        - 1411 MB/sec unidirectional bandwidth
        - 2239 MB/sec bidirectional bandwidth 

      One-sided operations:
        - 4.68 microsec Put latency
        - 1411 MB/sec unidirectional Put bandwidth
        - 2239 MB/sec bidirectional Put bandwidth

  - Solaris uDAPL/IBTL on Opteron with PCI-Ex and IBA-SDR:
      Two-sided operations:
        - 4.81 microsec one-way latency (4 bytes)
        - 981 MB/sec unidirectional bandwidth
        - 1903 MB/sec bidirectional bandwidth

      One-sided operations:
        - 7.49 microsec Put latency
        - 981 MB/sec unidirectional Put bandwidth
        - 1903 MB/sec bidirectional Put bandwidth

  - OpenIB/Gen2 uDAPL on EM64T with PCI-Ex and IBA-SDR:
      Two-sided operations:
        - 3.56 microsec one-way latency (4 bytes)
        - 964 MB/sec unidirectional bandwidth
        - 1846 MB/sec bidirectional bandwidth

      One-sided operations:
        - 6.85 microsec Put latency
        - 964 MB/sec unidirectional Put bandwidth
        - 1846 MB/sec bidirectional Put bandwidth

Performance numbers for all other platforms, system configurations and
operations can be viewed by visiting `Performance' section of the
project's web page.

With the ADI-3-level design, MVAPICH2 0.9.6 delivers similar
performance for two-sided operations compared to MVAPICH 0.9.8.
Performance comparison between MVAPICH2 0.9.6 and MVAPICH 0.9.8 for
sample applications can be seen by visiting the following URL:

  http://nowlab.cse.ohio-state.edu/projects/mpi-iba/perf-apps.html 

Organizations and users interested in getting the best performance for
both two-sided and one-sided operations and also want to exploit
`multi-threading', `integrated multi-rail', `multi-core optimization',
and `memory hook support' capabilities may migrate from MVAPICH code
base to MVAPICH2 code base.

For downloading MVAPICH2 0.9.6 package and accessing the anonymous
SVN, please visit the following URL:

http://nowlab.cse.ohio-state.edu/projects/mpi-iba/

A stripped down version of this release is also available at the
OpenIB SVN.

Summary of the testing of this release with various interfaces and on
different platforms is available here:

http://nowlab.cse.ohio-state.edu/projects/mpi-iba/download-mvapich2/testing.shtml

All feedbacks, including bug reports and hints for performance tuning,
are welcome. Please post it to the mvapich-discuss mailing list.

Thanks, 

MVAPICH Team at OSU/NBCL 

======================================================================
MVAPICH/MVAPICH2 project is currently supported with funding from
U.S. National Science Foundation, U.S. DOE Office of Science,
Mellanox, Intel, Cisco Systems, Sun Microsystems and Linux Networx;
and with equipment support from Advanced Clustering, AMD, Apple,
Appro, Dell, IBM, Intel, Mellanox, Microway, PathScale, SilverStorm
and Sun Microsystems. Other technology partner includes Etnus.
======================================================================