[ewg] [PATCH] docs/nes: update nes_release_notes.txt for OFED 1.4.1

Chien Tung chien.tin.tung at intel.com
Wed May 6 08:22:18 PDT 2009


Add description for new parameter
Add new MPI sections

Signed-off-by: Chien Tung <chien.tin.tung at intel.com>
---
 nes_release_notes.txt |  339 ++++++++++++++++++++++++++++++++++++++-----------
 1 files changed, 266 insertions(+), 73 deletions(-)

diff --git a/nes_release_notes.txt b/nes_release_notes.txt
index a024f47..39557cd 100644
--- a/nes_release_notes.txt
+++ b/nes_release_notes.txt
@@ -1,108 +1,301 @@
             Open Fabrics Enterprise Distribution (OFED)
-                Intel-NE RNIC RELEASE NOTES
-                       December 2008
+      NetEffect Ethernet Cluster Server Adapter Release Notes
+                           May 2009
 
 
 
-The iw_nes and libnes modules provide RDMA and NIC support for the
-Intel-NE NE020 series of adapters.  
+The iw_nes module and libnes user library provide RDMA and L2IF
+support for the NetEffect Ethernet Cluster Server Adapters.
+
 
 ============================================
-Loadable Module options
+Required Setting - RDMA Unify TCP port space
 ============================================
-The following options can be used when loading the iw_nes module:
+RDMA connections use the same TCP port space as the host stack.  To avoid
+conflicts, set rdma_cm module option unify_tcp_port_sapce to 1 by adding
+the following to /etc/modprobe.conf:
+
+    options rdma_cm unify_tcp_port_space=1
 
-mpa_version = 1;  
-    "MPA version to be used int MPA Req/Resp (0 or 1)"
 
-disable_mpa_crc = 0; 
-    "Disable checking of MPA CRC"
+=======================
+Loadable Module Options
+=======================
+The following options can be used when loading the iw_nes module by modifying
+modprobe.conf file:
 
-send_first = 0;
-    "Send RDMA Message First on Active Connection"
+wide_ppm_offset = 0
+    Set to 1 will increase CX4 interface clock ppm offset to 300ppm.
+    Default setting 0 is 100ppm.
 
-nes_drv_opt = 0;
-    "Driver option parameters"
+mpa_version = 1
+    MPA version to be used int MPA Req/Resp (0 or 1).
 
-    NES_DRV_OPT_ENABLE_MSI           0x00000010
-    NES_DRV_OPT_DUAL_LOGICAL_PORT    0x00000020
-    NES_DRV_OPT_SUPRESS_OPTION_BC    0x00000040
-    NES_DRV_OPT_NO_INLINE_DATA       0x00000080
-    NES_DRV_OPT_DISABLE_INT_MOD      0x00000100
-    NES_DRV_OPT_DISABLE_VIRT_WQ      0x00000200
-    NES_DRV_OPT_DISABLE_LRO          0x00000400
+disable_mpa_crc = 0
+    Disable checking of MPA CRC.
 
-nes_debug_level = 0;
-    "Enable debug output level"
+send_first = 0
+    Send RDMA Message First on Active Connection.
+
+nes_drv_opt = 0x00000100
+    Following options are supported:
+
+    Enable MSI - 0x00000010
+    No Inline Data - 0x00000080
+    Disable Interrupt Moderation - 0x00000100
+    Disable Virtual Work Queue - 0x00000200
+
+nes_debug_level = 0
+    Enable debug output level.
 
 wqm_quanta = 65536
-    "Size of data to be transmitted at a time"
+    Set size of data to be transmitted at a time.
 
 limit_maxrdreqsz = 0
-    "Limit PCI read request size to 256 bytes"
+    Limit PCI read request size to 256 bytes.
 
 
-============================================
-Runtime Module options
-============================================
+===============
+Runtime Options
+===============
 The following options can be used to alter the behavior of the iw_nes module:
+NOTE: Assuming NetEffect Ethernet Cluster Server Adapter is assigned eth2.
 
-tso 
-    ethtool -K eth2 tso on  == enables tso
-    ethtool -K eth2 tso off == disables tso
-                  
-jumbo
-    ifconfig eth2 mtu 9000  == largest mtu supported
-
-static interrupt moderation
-    ethtool -C eth2 rx-usecs-irq 128    
-                  
-dynamic interrupt moderation 
-    ethtool -C eth2 adaptive-rx on == enable
-    ethtool -C eth2 adaptive-rx off == disable
-    
-dynamic interrupt moderation 
-    ethtool -C eth2 rx-frames-low 12    == low watermark of rx queue
-    ethtool -C eth2 rx-frames-high 255  == high watermark of rx queue
-    ethtool -C eth2 rx-usecs-low 40     == smallest interrupt moderation timer
-    ethtool -C eth2 rx-usecs-high 1500  == largest interrupt moderation timer
+    ifconfig eth2 mtu 9000  - largest mtu supported
 
+    ethtool -K eth2 tso on  - enables TSO
+    ethtool -K eth2 tso off - disables TSO
 
-============================================
-Recommended setting
-============================================
-RDMA connections use the same TCP port space as the host stack.  To avoid
-conflicts, set rdma_cm module option unify_tcp_port_sapce to 1 by adding 
-the following to /etc/modprobe.conf:
+    ethtool -C eth2 rx-usecs-irq 128 - set static interrupt moderation
 
-    options rdma_cm unify_tcp_port_space=1
+    ethtool -C eth2 adaptive-rx on  - enable dynamic interrupt moderation
+    ethtool -C eth2 adaptive-rx off - disable dynamic interrupt moderation 
+    ethtool -C eth2 rx-frames-low 16 - low watermark of rx queue for dynamic
+                                       interrupt moderation
+    ethtool -C eth2 rx-frames-high 256 - high watermark of rx queue for
+                                         dynamic interrupt moderation
+    ethtool -C eth2 rx-usecs-low 40 - smallest interrupt moderation timer
+                                      for dynamic interrupt moderation
+    ethtool -C eth2 rx-usecs-high 1000 - largest interrupt moderation timer
+                                         for dynamic interrupt moderation
 
 
-============================================
-Known issues
-============================================
-On RHEL4 update 4, we have observed /dev/infiniband/uverbs0 does not
-always get created.  This device file is used for user-mode access to
-accelerated interface.  Current workaround is to change the start order
-for openibd(S05openibd) to after network(S10network).  For systems that
-start at runlevel 3 do the following:
+===================
+uDAPL Configuration
+===================
+Rest of the document assumes the following uDAPL settings in dat.conf:
+
+    OpenIB-cma-nes u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "eth2 0" ""
+    ofa-v2-nes u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "eth2 0" ""
+
+
+=======================================
+Recommended Settings for HP MPI 2.2.7
+=======================================
+Add the following to mpirun command:
+
+    -1sided
+
+Example mpirun command with uDAPL-2.0:
+
+    mpirun -UDAPL -prot -intra=shm 
+           -e MPI_ICLIB_UDAPL=libdaplofa.so.1
+           -e MPI_HASIC_UDAPL=ofa-v2-nes
+           -1sided
+           -f /opt/hpmpi/appfile
+
+Example mpirun command with uDAPL-1.2:
+
+    mpirun -UDAPL -prot -intra=shm 
+           -e MPI_ICLIB_UDAPL=libdaplcma.so.1
+           -e MPI_HASIC_UDAPL=OpenIB-cma-nes
+           -1sided 
+           -f /opt/hpmpi/appfile
+
+
+=======================================
+Recommended Settings for Intel MPI 3.2
+=======================================
+Add the following to mpiexec command:
+
+    -genv I_MPI_FALLBACK_DEVICE 0
+    -genv I_MPI_DEVICE rdma:OpenIB-cma-nes
+    -genv I_MPI_RENDEZVOUS_RDMA_WRITE
+
+Example mpiexec command line for uDAPL-2.0:
+
+    mpiexec -genv I_MPI_FALLBACK_DEVICE 0
+            -genv I_MPI_DEVICE rdma:ofa-v2-nes
+            -genv I_MPI_RENDEZVOUS_RDMA_WRITE
+            -ppn 1 -n 2
+            /opt/intel/impi/3.2.0.011/bin64/IMB-MPI1
+
+Example mpiexec command line for uDAPL-1.2:
+
+    mpiexec -genv I_MPI_FALLBACK_DEVICE 0
+            -genv I_MPI_DEVICE rdma:OpenIB-cma-nes
+            -genv I_MPI_RENDEZVOUS_RDMA_WRITE
+            -ppn 1 -n 2
+            /opt/intel/impi/3.2.0.011/bin64/IMB-MPI1
+
+
+========================================
+Recommended Setting for MVAPICH2 and OFA
+========================================
+Add the following to the mpirun commmand:
+
+    -env MV2_USE_RDMA_CM 1
+    -env MV2_USE_IWARP_MODE 1
+
+For larger number of processes, it is also recommended to set the following:
+
+    -env MV2_MAX_INLINE_SIZE 64
+    -env MV2_USE_SRQ 0
+
+Example mpiexec command line:
+
+    mpiexec -l -n 2
+            -env MV2_USE_RDMA_CM 1
+            -env MV2_USE_IWARP_MODE 1 
+            /usr/mpi/gcc/mvapich2-1.2p1/tests/osu_benchmarks-3.0/osu_latency
+
+
+==========================================
+Recommended Setting for MVAPICH2 and uDAPL
+==========================================
+Add the following to the mpirun commmand:
+
+    -env MV2_PREPOST_DEPTH 59
+
+Example mpiexec command line:
+
+    mpiexec -l -n 2
+            -env MV2_DAPL_PROVIDER ofa-v2-nes
+            -env MV2_PREPOST_DEPTH 59 
+            /usr/mpi/gcc/mvapich2-1.2p1/tests/osu_benchmarks-3.0/osu_latency
+
+    mpiexec -l -n 2
+            -env MV2_DAPL_PROVIDER OpenIB-cma-nes
+            -env MV2_PREPOST_DEPTH 59 
+            /usr/mpi/gcc/mvapich2-1.2p1/tests/osu_benchmarks-3.0/osu_latency
+
+
+===========================
+Modify Settings in Open MPI
+===========================
+There are more than one way to specifiy MCA parameters in
+Open MPI.  Please visit this link and use the best method
+for your environment:
+
+http://www.open-mpi.org/faq/?category=tuning#setting-mca-params
+
+
+=======================================
+Recommended Settings for Open MPI 1.3.2
+=======================================
+Caching pinned memory is enabled by default but it may be necessary
+to limit the size of the cache to prevent running out of memory by
+adding the following parameter:
+
+    mpool_rdma_rcache_size_limit = <cache size>
+
+The cache size depends on the number of processes and nodes, e.g. for
+64 processes with 8 nodes, limit the pinned cache size to
+104857600 (100 MBytes).
+
+Example mpirun command line:
+
+    mpirun -np 2 -hostfile /opt/mpd.hosts
+           -mca btl openib,self,sm
+           -mca mpool_rdma_rcache_size_limit 104857600 
+           /usr/mpi/gcc/openmpi-1.3.2/tests/IMB-3.1/IMB-MPI1
+
+
+=======================================
+Recommended Settings for Open MPI 1.3.1
+=======================================
+There is a known problem with cached pinned memory.  It is recommended
+that pinned memory caching be disabled.  For more information, see
+https://svn.open-mpi.org/trac/ompi/ticket/1853
+
+To disable pinned memory caching, add the following parameter:
+
+    mpi_leave_pinned = 0
+
+Example mpirun command line:
+
+    mpirun -np 2 -hostfile /opt/mpd.hosts
+           -mca btl openib,self,sm
+           -mca btl_mpi_leave_pinned 0
+           /usr/mpi/gcc/openmpi-1.3.1/tests/IMB-3.1/IMB-MPI1
+
+
+=====================================
+Recommended Settings for Open MPI 1.3
+=====================================
+There is a known problem with cached pinned memory.  It is recommended
+that pinned memory caching be disabled.  For more information, see
+https://svn.open-mpi.org/trac/ompi/ticket/1853
+
+To disable pinned memory caching, add the following parameter:
+
+    mpi_leave_pinned = 0
+
+Receive Queue setting:
+
+    btl_openib_receive_queues = P,65536,256,192,128
+
+Set maximum size of inline data segment to 64:
+
+    btl_openib_max_inline_data = 64
+
+Example mpirun command:
+
+    mpirun -np 2 -hostfile /root/mpd.hosts
+           -mca btl openib,self,sm
+           -mca btl_mpi_leave_pinned 0
+           -mca btl_openib_receive_queues P,65536,256,192,128
+           -mca btl_openib_max_inline_data 64
+           /usr/mpi/gcc/openmpi-1.3/tests/IMB-3.1/IMB-MPI1
+
+
+============
+Known Issues
+============
+The following is a list of known issues with Linux kernel and
+OFED 1.4.1 release.
+
+1. We have observed "__qdisc_run" softlockup crash running UDP
+   traffic on RHEL5.1 systems with more than 8 cores.  The issue
+   is in Linux network stack. The fix for this is available from
+   following link:
+
+http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git
+;a=commitdiff;h=2ba2506ca7ca62c56edaa334b0fe61eb5eab6ab0
+;hp=32aced7509cb20ef3ec67c9b56f5b55c41dd4f8d
+
+
+2. Running Pallas test suite and MVAPICH2 (OFA/uDAPL) for more
+   than 64 processes will abnormally terminate.  The workaround is
+   add the following to mpirun command:
+
+   -env MV2_ON_DEMAND_THRESHOLD <total processes>
 
-    mv /etc/rc.d/rc3.d/S05openibd /etc/rc.d/rc3.d/S11openibd
+   e.g. For 72 total processes, -env MV2_ON_DEMAND_THRESHOLD 72
 
-For runlevel 5 do:
 
-    mv /etc/rc.d/rc5.d/S05openibd /etc/rc.d/rc5.d/S11openibd
+3. For MVAPICH2 (OFA/uDAPL) IMB-EXT (part of Pallas suite) "Window" test 
+   may show high latency numbers.  It is recommended to turn off one sided
+   communication by adding following to the mpirun command:
 
+   -env MV2_USE_RDMA_ONE_SIDED 0
 
-Some MPIs require the node that initiated the RDMA connection to send
-the first RDMA message.  Enable this feature by adding the following 
-to /etc/modprobe.conf:
 
-    options iw_nes send_first=1
+4. IMB-EXT does not run with Open MPI 1.3.1 or 1.3.  The workaround is
+   to turn off message coalescing by adding the following to mpirun
+   command:
 
+    -mca btl_openib_use_message_coalescing 0
 
-For Intel MPI, iw_nes currently does not support dynamic connection 
-establishment feature.  Turn it off by setting/exporting the 
-I_MPI_USE_DYNAMIC_CONNECTIONS variable to 0:
 
-    export I_MPI_USE_DYNAMIC_CONNECTIONS=0
+NetEffect is a trademark of Intel Corporation in the U.S. and other countries.
-- 
1.5.3.3




More information about the ewg mailing list