[ofa-general] [PATCH] OpenSM release notes: Update to 3.1.11

Hal Rosenstock hrosenstock at xsigo.com
Mon Jun 2 07:00:29 PDT 2008


OpenSM release notes: Update to 3.1.11

Please apply to ofed_1_3 and master

Signed-off-by: Hal Rosenstock <hal at xsigo.com>

diff --git a/opensm/doc/opensm_release_notes-3.1.10.txt b/opensm/doc/opensm_release_notes-3.1.10.txt
deleted file mode 100644
index 2b6253d..0000000
--- a/opensm/doc/opensm_release_notes-3.1.10.txt
+++ /dev/null
@@ -1,492 +0,0 @@
-                        OpenSM Release Notes 3.1.10
-                       =============================
-
-Version: OpenFabrics Enterprise Distribution (OFED) 1.3
-Repo:    git://git.openfabrics.org/~ofed_1_3/management.git (release)
-         git://git.openfabrics.org/~sashak/management.git (development)
-Date:    February 2008
-
-1 Overview
-----------
-This document describes the contents of the OpenSM OFED 1.3 release.
-OpenSM is an InfiniBand compliant Subnet Manager and Administration,
-and runs on top of OpenIB. The OpenSM version for this release
-is openib-3.1.10
-
-This document includes the following sections:
-1 This Overview section (describing new features and software
-  dependencies)
-2 Known Issues And Limitations
-3 Unsupported IB compliance statements
-4 Major Bug Fixes
-5 Main Verification Flows
-6 Qualified software stacks and devices
-
-1.1 Major New Features
-
-* QoS manager (experimental)
-  This QoS manager implementation is in accordance with IBA QoS Annex.
-  Highly configurable QoS Policy is parsed from OpenSM QoS policy file.
-  Valid QoS parameters will be reported in SA PathRecord and
-  MultiPathRecord. In addition simple QoS levels per ULPs configuration
-  is supported too.
-
-* Performance Manager
-  When enabled it collects a fabric port counters and able to log it or
-  to pass to external program via event plugin interface. It handles
-  counters overflow, supports LID/QP redirection and is able to work
-  when OpenSM is in master, standby, and inactive states.
-
-* Dimension Order routing (DOR) algorithm
-  DOR Unicast routing algorithm - based on the Min Hop algorithm,  but
-  avoids  port  equalization  except for redundant links between the
-  same two switches.  This provides deadlock free routes for hypercubes
-  when the  fabric  is  cabled  as a hypercube and for meshes when cabled
-  as a mesh (see details in OpenSM man page).
-
-* Routing improvements
-  Speedup the current routing algorithms default MinHops, Up/Down and
-  LASH and lid matrix generation. Fat Tree routing engine is able to work
-  with not pure fat free topology.
-
-* Multiple IB routers support
-  OpenSM now able to keep configurable subnet prefix to router table.
-  SA will report path to this routers when SA PathRecord was issued with
-  non-local DGID.
-
-* Node map
-  This is possible to name nodes in this config file. Those names will be
-  used for logging and by QoS configuration.
-
-* PKey index support
-  Proper support for PKey index in GSI queries.
-
-* Incremental LFTs, PKey, SL2VL, and VLarbitration table updates
-  OpenSM will only fetch those tables in first heavy sweep and then
-  will maintain this internally.
-
-* Fast port and switch detector
-  When port and/or switch was externally reset and it was fast so sweep
-  doesn't find this device as disconnected OpenSM will detect this by
-  changed port states and handle accordingly.
-
-* Duplicated GUIDs/port moving detector
-  OpenSM will be able to detect port moving during a fabric discovery
-  and will not report duplicated GUIDs in this case.
-
-* Multicast rerouting speedup
-  Now OpenSM will calculate and setup multicast forwarding tables for
-  all altered multicast groups and not for each one.
-
-* Event plugin API
-  OpenSM allows to load dynamically various plugin modules.
-
-* Many generic improvements
-
-1.2 Minor New Features:
-
-* Daemon mode can be activated with -B option.
-
-* Support multiple scopes for IPoIB multicast groups in partition config.
-
-* Loopback connection handling
-  Loopback connection is not interpreted as duplicated GUID anymore.
-
-* Connect root nodes option for Up/Down routing engine.
-  When this option is specified Up/Down will create routing paths between
-  its root nodes.
-
-* Dump and log filenames changed from osm* to opensm*.
-
-* Support loopback console
-  Socket console with only local access.
-
-* Configurable config directory (the default value is /etc/opensm) and
-  configurable default values of OpenSM config filenames.
-
-* Add option for force SDR link speed
-  Add option to opensm.opts to force link speed. Currently, only forcing
-  to SDR link speed is supported. This option is not supported as a
-  command line option.
-
-* Better packaging
-  Building and RPM packaging were improved and simplified.
-
-* Handle "babbling" ports
-  When a babbling port (port which causes a frequent trap generation) is
-  detected, OpenSM will disable the port which should terminate the trap
-  storm.
-
-1.3 Library API Changes
-
-  None
-
-1.4 Software Dependencies
-
-OpenSM depends on the installation of either OFED 1.3, OFED 1.2, OFED 1.1,
-OFED 1.0, OpenIB gen2 (e.g. IBG2 distribution), OpenIB gen1 (e.g. IBGD
-distribution), or Mellanox VAPI stacks. The qualified driver versions
-are provided in Table 2, "Qualified IB Stacks".
-
-Also building of QoS manager policy file parser requires flex, and either
-bison or byacc installed.
-
-1.5 Supported Devices Firmware
-
-The main task of OpenSM is to initialize InfiniBand devices. The
-qualified devices and their corresponding firmware versions
-are listed in Table 3.
-
-2 Known Issues And Limitations
-------------------------------
-
-* No Service / Key associations:
-  There is no way to manage Service access by Keys.
-
-* No SM to SM SMDB synchronization:
-  Puts the burden of re-registering services, multicast groups, and
-  inform-info on the client application (or IB access layer core).
-
-3 Unsupported IB Compliance Statements
---------------------------------------
-The following section lists all the IB compliance statements which
-OpenSM does not support. Please refer to the IB specification for detailed
-information regarding each compliance statement.
-
-* C14-22 (Authentication):
-  M_Key M_KeyProtectBits and M_KeyLeasePeriod shall be set in one
-  SubnSet method. As a work-around, an OpenSM option is provided for
-  defining the protect bits.
-
-* C14-67 (Authentication):
-  On SubnGet(SMInfo) and SubnSet(SMInfo) - if M_Key is not zero then
-  the SM shall generate a SubnGetResp if the M_Key matches, or
-  silently drop the packet if M_Key does not match.
-
-* C15-0.1.23.4 (Authentication):
-  InformInfoRecords shall always be provided with the QPN set to 0,
-  except for the case of a trusted request, in which case the actual
-  subscriber QPN shall be returned.
-
-* o13-17.1.2 (Event-FWD):
-  If no permission to forward, the subscription should be removed and
-  no further forwarding should occur.
-
-* C14-24.1.1.5 and C14-62.1.1.22 (Initialization):
-  GUIDInfo - SM should enable assigning Port GUIDInfo.
-
-* C14-44 (Initialization):
-  If the SM discovers that it is missing an M_Key to update CA/RT/SW,
-  it should notify the higher level.
-
-* C14-62.1.1.12 (Initialization):
-  PortInfo:M_Key - Set the M_Key to a node based random value.
-
-* C14-62.1.1.13 (Initialization):
-  PortInfo:P_KeyProtectBits - set according to an optional policy.
-
-* C14-62.1.1.24 (Initialization):
-  SwitchInfo:DefaultPort - should be configured for random FDB.
-
-* C14-62.1.1.32 (Initialization):
-  RandomForwardingTable should be configured.
-
-* o15-0.1.12 (Multicast):
-  If the JoinState is SendOnlyNonMember = 1 (only), then the endport
-  should join as sender only.
-
-* o15-0.1.8 (Multicast):
-  If a request for creating an MCG with fields that cannot be met,
-  return ERR_REQ_INVALID (currently ignores SL and FlowLabelTClass).
-
-* C15-0.1.8.6 (SA-Query):
-  Respond to SubnAdmGetTraceTable - this is an optional attribute.
-
-* C15-0.1.13 Services:
-  Reject ServiceRecord create, modify or delete if the given
-  ServiceP_Key does not match the one included in the ServiceGID port
-  and the port that sent the request.
-
-* C15-0.1.14 (Services):
-  Provide means to associate service name and ServiceKeys.
-
-4 Major Bug Fixes
------------------
-
-The following is a list of bugs that were fixed. Note that other less critical
-or visible bugs were also fixed.
-
-* osm_ucast_ftree.c: do load-leveling of non-CN routes
-
-* osm_ucast_ftree.c: ignore port 0 and loopbacks on switches
-
-* lash: fix possible segfault in osm_get_lash_sl()
-
-* osm_ucast_ftree.c: fixing coredump in fat-tree routing
-
-* osm_sa_slvl_record: fix overflow crash
-
-* Break multicast rerouting requests processing when heavy sweep is
-  scheduled.
-
-* updn: report fallback properly
-
-* Fix incorrect identification of routing engine used
-
-* Don't zero base LID when invalid value is received
-
-* lash: fix wrong allocation size
-
-* Fixing broken logic in 'process world' part of LinkRecord processing
-
-* Fix lmc_mask bit order in osm_sa_link_record.c
-
-* Adding missing comparison by to_lid/from_lid in LinkRecord processing
-
-* Broken logic when scanning subnet for PIR request
-
-* No interactive games in daemon mode
-
-* Fixing memory leak in node description
-
-* Fix PortInfo update issues for switch port 0
-
-* Changed method_mask type in user_mad interface in accordance with
-  kernel ABI
-
-* Use umad_get_issm_path() in osm_vendor_set_sm()
-
-* Report message fix
-
-* Uninitialized variables usage fix
-
-* osm_ucast_ftree.c: Possible NULL ptr seg fault
-
-* osm_mcast_mgr.c: Possible NULL ptr seg fault
-
-* TrapRepress was failing for mkey != 0
-
-* IB_PR_COMPMASK was used in MPR
-
-* Set hop limit when creating ipoib multicast groups
-
-* Fix outstanding mad counters tracking on the error paths.
-
-* Report new ports before handover mastership
-
-* Fix opvls and neighbormtu when remote port invalid.
-
-* Bug in coding trying to set vl_arb_high_limit when PortInfo.base_lid
-  was still zero.
-
-* Protect SMInfo response against port moving issue.
-
-5 Main Verification Flows
--------------------------
-
-OpenSM verification is run using the following activities:
-* osmtest - a stand-alone program
-* ibmgtsim (IB management simulator) based - a set of flows that
-  simulate clusters, inject errors and verify OpenSM capability to
-  respond and bring up the network correctly.
-* small cluster regression testing - where the SM is used on back to
-  back or single switch configurations. The regression includes
-  multiple OpenSM dedicated tests.
-* cluster testing - when we run OpenSM to setup a large cluster, perform
-  hand-off, reboots and reconnects, verify routing correctness and SA
-  responsiveness at the ULP level (IPoIB and SDP).
-
-5.1 osmtest
-
-osmtest is an automated verification tool used for OpenSM
-testing. Its verification flows are described by list below.
-
-* Inventory File: Obtain and verify all port info, node info, link and path
-  records parameters.
-
-* Service Record:
-   - Register new service
-   - Register another service (with a lease period)
-   - Register another service (with service p_key set to zero)
-   - Get all services by name
-   - Delete the first service
-   - Delete the third service
-   - Added bad flows of get/delete  non valid service
-   - Add / Get same service with different data
-   - Add / Get / Delete by different component  mask values (services
-     by Name & Key / Name & Data / Name & Id / Id only )
-
-* Multicast Member Record:
-   - Query of existing Groups (IPoIB)
-   - BAD Join with insufficient comp mask (o15.0.1.3)
-   - Create given MGID=0 (o15.0.1.4)
-   - Create given MGID=0xFF12A01C,FE800000,00000000,12345678 (o15.0.1.4)
-   - Create BAD MGID=0xFA. (o15.0.1.6)
-   - Create BAD MGID=0xFF12A01B w/ link-local not set (o15.0.1.6)
-   - New MGID with invalid join state (o15.0.1.9)
-   - Retry of existing MGID - See JoinState update (o15.0.1.11)
-   - BAD RATE when connecting to existing MGID (o15.0.1.13)
-   - Partial JoinState delete request - removing FullMember (o15.0.1.14)
-   - Full Delete of a group (o15.0.1.14)
-   - Verify Delete by trying to Join deleted group (o15.0.1.14)
-   - BAD Delete of IPoIB membership (no prev join) (o15.0.1.15)
-
-* GUIDInfo Record:
-   - All GUIDInfoRecords in subnet are obtained
-
-* MultiPathRecord:
-   - Perform some compliant and noncompliant MultiPathRecord requests
-   - Validation is via status in responses and IB analyzer
-
-* PKeyTableRecord:
-  - Perform some compliant and noncompliant PKeyTableRecord queries
-  - Validation is via status in responses and IB analyzer
-
-* LinearForwardingTableRecord:
-  - Perform some compliant and noncompliant LinearForwardingTableRecord queries
-  - Validation is via status in responses and IB analyzer
-
-* Event Forwarding: Register for trap forwarding using reports
-   - Send a trap and wait for report
-   - Unregister non-existing
-
-* Trap 64/65 Flow: Register to Trap 64-65, create traps (by
-  disconnecting/connecting ports) and wait for report, then unregister.
-
-* Stress Test: send PortInfoRecord queries, both single and RMPP and
-  check for the rate of responses as well as their validity.
-
-
-5.2 IB Management Simulator OpenSM Test Flows:
-
-The simulator provides ability to simulate the SM handling of virtual
-topologies that are not limited to actual lab equipment availability.
-OpenSM was simulated to bring up clusters of up to 10,000 nodes. Daily
-regressions use smaller (16 and 128 nodes clusters).
-
-The following test flows are run on the IB management simulator:
-
-* Stability:
-  Up to 12 links from the fabric are randomly selected to drop packets
-  at drop rates up to 90%. The SM is required to succeed in bringing the
-  fabric up. The resulting routing is verified to be correct as well.
-
-* LID Manager:
-  Using LMC = 2 the fabric is initialized with LIDs. Faults such as
-  zero LID, Duplicated LID, non-aligned (to LMC) LIDs are
-  randomly assigned to various nodes and other errors are randomly
-  output to the guid2lid cache file. The SM sweep is run 5 times and
-  after each iteration a complete verification is made to ensure that all
-  LIDs that could possibly be maintained are kept, as well as that all nodes
-  were assigned a legal LID range.
-
-* Multicast Routing:
-  Nodes randomly join the 0xc000 group and eventually the
-  resulting routing is verified for completeness and adherence to
-  Up/Down routing rules.
-
-* osmtest:
-  The complete osmtest flow as described in the previous table is run on
-  the simulated fabrics.
-
-* Stress Test:
-  This flow merges fabric, LID and stability issues with continuous
-  PathRecord, ServiceRecord and Multicast Join/Leave activity to
-  stress the SM/SA during continuous sweeps. InformInfo Set/Delete/Get
-  were added to the test such both existing and non existing nodes
-  perform them in random order.
-
-5.3 OpenSM Regression
-
-Using a back-to-back or single switch connection, the following set of
-tests is run nightly on the stacks described in table 2. The included
-tests are:
-
-* Stress Testing: Flood the SA with queries from multiple channel
-  adapters to check the robustness of the entire stack up to the SA.
-
-* Dynamic Changes: Dynamic Topology changes, through randomly
-  dropping SMP packets, used to test OpenSM adaptation to an unstable
-  network & verify DB correctness.
-
-* Trap Injection: This flow injects traps to the SM and verifies that it
-  handles them gracefully.
-
-* SA Query Test: This test exhaustively checks the SA responses to all
-  possible single component mask. To do that the test examines the
-  entire set of records the SA can provide, classifies them by their
-  field values and then selects every field (using component mask and a
-  value) and verifies that the response matches the expected set of records.
-  A random selection using multiple component mask bits is also performed.
-
-5.4 Cluster testing:
-
-Cluster testing is usually run before a distribution release. It
-involves real hardware setups of 16 to 32 nodes (or more if a beta site
-is available). Each test is validated by running all-to-all ping through the IB
-interface. The test procedure includes:
-
-* Cluster bringup
-
-* Hand-off between 2 or 3 SM's while performing:
-  - Node reboots
-  - Switch power cycles (disconnecting the SM's)
-
-* Unresponsive port detection and recovery
-
-* osmtest from multiple nodes
-
-* Trap injection and recovery
-
-
-6 Qualification
-----------------
-
-Table 2 - Qualified IB Stacks
-=============================
-
-Stack                                    | Version
------------------------------------------|--------------------------
-OFED                                     |   1.3
-OFED                                     |   1.2
-OFED                                     |   1.1
-OFED                                     |   1.0
-OpenIB Gen2 (IBG2 distribution)          |   1.0
-OpenIB Gen1 (IBGD distribution)          |   1.8.0
-VAPI (Mellanox InfiniBand HCA Driver)    |   3.2 and later
-
-Table 3 - Qualified Devices and Corresponding Firmware
-======================================================
-
-Mellanox
-Device                              |   FW versions
-------------------------------------|-------------------------------
-InfiniScale                         | fw-43132  5.2.000 (and later)
-InfiniScale III                     | fw-47396  0.5.000 (and later)
-InfiniHost                          | fw-23108  3.5.000 (and later)
-InfiniHost III Lx                   | fw-25204  1.2.000 (and later)
-InfiniHost III Ex (InfiniHost Mode) | fw-25208  4.8.200 (and later)
-InfiniHost III Ex (MemFree Mode)    | fw-25218  5.3.000 (and later)
-ConnectX IB                         | fw-25408  2.3.000 (and later)
-
-QLogic/PathScale
-Device  |   Note
---------|-----------------------------------------------------------
-iPath   | QHT6040 (PathScale InfiniPath HT-460)
-iPath   | QHT6140 (PathScale InfiniPath HT-465)
-iPath   | QLE6140 (PathScale InfiniPath PE-880)
-iPath   | QLE7240
-iPath   | QLE7280
-
-Note 1: OpenSM does not run on an IBM Galaxy (eHCA) as it does not expose
-QP0 and QP1. However, it does support it as a device on the subnet.
-
-Note 2: QoS firmware and Mellanox devices
-
-HCAs: QoS supported by ConnectX. The current FW release
-doesn't support QoS. QoS-enabled FW release (2_5_000) is
-planned for May. If someone wishes to get QoS-enabled FW
-before the official release, they should contact Mellanox FAE.
-
-Switches: QoS supported by InfiniScale III
-Any InfiniScale III FW that is supported by OpenSM supports QoS.
diff --git a/opensm/doc/opensm_release_notes-3.1.11.txt b/opensm/doc/opensm_release_notes-3.1.11.txt
new file mode 100644
index 0000000..c7bbb41
--- /dev/null
+++ b/opensm/doc/opensm_release_notes-3.1.11.txt
@@ -0,0 +1,492 @@
+                        OpenSM Release Notes 3.1.11
+                       =============================
+
+Version: OpenFabrics Enterprise Distribution (OFED) 1.3
+Repo:    git://git.openfabrics.org/~ofed_1_3/management.git (release)
+         git://git.openfabrics.org/~sashak/management.git (development)
+Date:    May 2008
+
+1 Overview
+----------
+This document describes the contents of the OpenSM OFED 1.3 release.
+OpenSM is an InfiniBand compliant Subnet Manager and Administration,
+and runs on top of OpenIB. The OpenSM version for this release
+is openib-3.1.11
+
+This document includes the following sections:
+1 This Overview section (describing new features and software
+  dependencies)
+2 Known Issues And Limitations
+3 Unsupported IB compliance statements
+4 Major Bug Fixes
+5 Main Verification Flows
+6 Qualified software stacks and devices
+
+1.1 Major New Features
+
+* QoS manager (experimental)
+  This QoS manager implementation is in accordance with IBA QoS Annex.
+  Highly configurable QoS Policy is parsed from OpenSM QoS policy file.
+  Valid QoS parameters will be reported in SA PathRecord and
+  MultiPathRecord. In addition simple QoS levels per ULPs configuration
+  is supported too.
+
+* Performance Manager
+  When enabled it collects a fabric port counters and able to log it or
+  to pass to external program via event plugin interface. It handles
+  counters overflow, supports LID/QP redirection and is able to work
+  when OpenSM is in master, standby, and inactive states.
+
+* Dimension Order routing (DOR) algorithm
+  DOR Unicast routing algorithm - based on the Min Hop algorithm,  but
+  avoids  port  equalization  except for redundant links between the
+  same two switches.  This provides deadlock free routes for hypercubes
+  when the  fabric  is  cabled  as a hypercube and for meshes when cabled
+  as a mesh (see details in OpenSM man page).
+
+* Routing improvements
+  Speedup the current routing algorithms default MinHops, Up/Down and
+  LASH and lid matrix generation. Fat Tree routing engine is able to work
+  with not pure fat free topology.
+
+* Multiple IB routers support
+  OpenSM now able to keep configurable subnet prefix to router table.
+  SA will report path to this routers when SA PathRecord was issued with
+  non-local DGID.
+
+* Node map
+  This is possible to name nodes in this config file. Those names will be
+  used for logging and by QoS configuration.
+
+* PKey index support
+  Proper support for PKey index in GSI queries.
+
+* Incremental LFTs, PKey, SL2VL, and VLarbitration table updates
+  OpenSM will only fetch those tables in first heavy sweep and then
+  will maintain this internally.
+
+* Fast port and switch detector
+  When port and/or switch was externally reset and it was fast so sweep
+  doesn't find this device as disconnected OpenSM will detect this by
+  changed port states and handle accordingly.
+
+* Duplicated GUIDs/port moving detector
+  OpenSM will be able to detect port moving during a fabric discovery
+  and will not report duplicated GUIDs in this case.
+
+* Multicast rerouting speedup
+  Now OpenSM will calculate and setup multicast forwarding tables for
+  all altered multicast groups and not for each one.
+
+* Event plugin API
+  OpenSM allows to load dynamically various plugin modules.
+
+* Many generic improvements
+
+1.2 Minor New Features:
+
+* Daemon mode can be activated with -B option.
+
+* Support multiple scopes for IPoIB multicast groups in partition config.
+
+* Loopback connection handling
+  Loopback connection is not interpreted as duplicated GUID anymore.
+
+* Connect root nodes option for Up/Down routing engine.
+  When this option is specified Up/Down will create routing paths between
+  its root nodes.
+
+* Dump and log filenames changed from osm* to opensm*.
+
+* Support loopback console
+  Socket console with only local access.
+
+* Configurable config directory (the default value is /etc/opensm) and
+  configurable default values of OpenSM config filenames.
+
+* Add option for force SDR link speed
+  Add option to opensm.opts to force link speed. Currently, only forcing
+  to SDR link speed is supported. This option is not supported as a
+  command line option.
+
+* Better packaging
+  Building and RPM packaging were improved and simplified.
+
+* Handle "babbling" ports
+  When a babbling port (port which causes a frequent trap generation) is
+  detected, OpenSM will disable the port which should terminate the trap
+  storm.
+
+1.3 Library API Changes
+
+  None
+
+1.4 Software Dependencies
+
+OpenSM depends on the installation of either OFED 1.3, OFED 1.2, OFED 1.1,
+OFED 1.0, OpenIB gen2 (e.g. IBG2 distribution), OpenIB gen1 (e.g. IBGD
+distribution), or Mellanox VAPI stacks. The qualified driver versions
+are provided in Table 2, "Qualified IB Stacks".
+
+Also building of QoS manager policy file parser requires flex and bison
+installed.
+
+1.5 Supported Devices Firmware
+
+The main task of OpenSM is to initialize InfiniBand devices. The
+qualified devices and their corresponding firmware versions
+are listed in Table 3.
+
+2 Known Issues And Limitations
+------------------------------
+
+* No Service / Key associations:
+  There is no way to manage Service access by Keys.
+
+* No SM to SM SMDB synchronization:
+  Puts the burden of re-registering services, multicast groups, and
+  inform-info on the client application (or IB access layer core).
+
+3 Unsupported IB Compliance Statements
+--------------------------------------
+The following section lists all the IB compliance statements which
+OpenSM does not support. Please refer to the IB specification for detailed
+information regarding each compliance statement.
+
+* C14-22 (Authentication):
+  M_Key M_KeyProtectBits and M_KeyLeasePeriod shall be set in one
+  SubnSet method. As a work-around, an OpenSM option is provided for
+  defining the protect bits.
+
+* C14-67 (Authentication):
+  On SubnGet(SMInfo) and SubnSet(SMInfo) - if M_Key is not zero then
+  the SM shall generate a SubnGetResp if the M_Key matches, or
+  silently drop the packet if M_Key does not match.
+
+* C15-0.1.23.4 (Authentication):
+  InformInfoRecords shall always be provided with the QPN set to 0,
+  except for the case of a trusted request, in which case the actual
+  subscriber QPN shall be returned.
+
+* o13-17.1.2 (Event-FWD):
+  If no permission to forward, the subscription should be removed and
+  no further forwarding should occur.
+
+* C14-24.1.1.5 and C14-62.1.1.22 (Initialization):
+  GUIDInfo - SM should enable assigning Port GUIDInfo.
+
+* C14-44 (Initialization):
+  If the SM discovers that it is missing an M_Key to update CA/RT/SW,
+  it should notify the higher level.
+
+* C14-62.1.1.12 (Initialization):
+  PortInfo:M_Key - Set the M_Key to a node based random value.
+
+* C14-62.1.1.13 (Initialization):
+  PortInfo:P_KeyProtectBits - set according to an optional policy.
+
+* C14-62.1.1.24 (Initialization):
+  SwitchInfo:DefaultPort - should be configured for random FDB.
+
+* C14-62.1.1.32 (Initialization):
+  RandomForwardingTable should be configured.
+
+* o15-0.1.12 (Multicast):
+  If the JoinState is SendOnlyNonMember = 1 (only), then the endport
+  should join as sender only.
+
+* o15-0.1.8 (Multicast):
+  If a request for creating an MCG with fields that cannot be met,
+  return ERR_REQ_INVALID (currently ignores SL and FlowLabelTClass).
+
+* C15-0.1.8.6 (SA-Query):
+  Respond to SubnAdmGetTraceTable - this is an optional attribute.
+
+* C15-0.1.13 Services:
+  Reject ServiceRecord create, modify or delete if the given
+  ServiceP_Key does not match the one included in the ServiceGID port
+  and the port that sent the request.
+
+* C15-0.1.14 (Services):
+  Provide means to associate service name and ServiceKeys.
+
+4 Major Bug Fixes
+-----------------
+
+The following is a list of bugs that were fixed. Note that other less critical
+or visible bugs were also fixed.
+
+* osm_ucast_ftree.c: do load-leveling of non-CN routes
+
+* osm_ucast_ftree.c: ignore port 0 and loopbacks on switches
+
+* lash: fix possible segfault in osm_get_lash_sl()
+
+* osm_ucast_ftree.c: fixing coredump in fat-tree routing
+
+* osm_sa_slvl_record: fix overflow crash
+
+* Break multicast rerouting requests processing when heavy sweep is
+  scheduled.
+
+* updn: report fallback properly
+
+* Fix incorrect identification of routing engine used
+
+* Don't zero base LID when invalid value is received
+
+* lash: fix wrong allocation size
+
+* Fixing broken logic in 'process world' part of LinkRecord processing
+
+* Fix lmc_mask bit order in osm_sa_link_record.c
+
+* Adding missing comparison by to_lid/from_lid in LinkRecord processing
+
+* Broken logic when scanning subnet for PIR request
+
+* No interactive games in daemon mode
+
+* Fixing memory leak in node description
+
+* Fix PortInfo update issues for switch port 0
+
+* Changed method_mask type in user_mad interface in accordance with
+  kernel ABI
+
+* Use umad_get_issm_path() in osm_vendor_set_sm()
+
+* Report message fix
+
+* Uninitialized variables usage fix
+
+* osm_ucast_ftree.c: Possible NULL ptr seg fault
+
+* osm_mcast_mgr.c: Possible NULL ptr seg fault
+
+* TrapRepress was failing for mkey != 0
+
+* IB_PR_COMPMASK was used in MPR
+
+* Set hop limit when creating ipoib multicast groups
+
+* Fix outstanding mad counters tracking on the error paths.
+
+* Report new ports before handover mastership
+
+* Fix opvls and neighbormtu when remote port invalid.
+
+* Bug in coding trying to set vl_arb_high_limit when PortInfo.base_lid
+  was still zero.
+
+* Protect SMInfo response against port moving issue.
+
+5 Main Verification Flows
+-------------------------
+
+OpenSM verification is run using the following activities:
+* osmtest - a stand-alone program
+* ibmgtsim (IB management simulator) based - a set of flows that
+  simulate clusters, inject errors and verify OpenSM capability to
+  respond and bring up the network correctly.
+* small cluster regression testing - where the SM is used on back to
+  back or single switch configurations. The regression includes
+  multiple OpenSM dedicated tests.
+* cluster testing - when we run OpenSM to setup a large cluster, perform
+  hand-off, reboots and reconnects, verify routing correctness and SA
+  responsiveness at the ULP level (IPoIB and SDP).
+
+5.1 osmtest
+
+osmtest is an automated verification tool used for OpenSM
+testing. Its verification flows are described by list below.
+
+* Inventory File: Obtain and verify all port info, node info, link and path
+  records parameters.
+
+* Service Record:
+   - Register new service
+   - Register another service (with a lease period)
+   - Register another service (with service p_key set to zero)
+   - Get all services by name
+   - Delete the first service
+   - Delete the third service
+   - Added bad flows of get/delete  non valid service
+   - Add / Get same service with different data
+   - Add / Get / Delete by different component  mask values (services
+     by Name & Key / Name & Data / Name & Id / Id only )
+
+* Multicast Member Record:
+   - Query of existing Groups (IPoIB)
+   - BAD Join with insufficient comp mask (o15.0.1.3)
+   - Create given MGID=0 (o15.0.1.4)
+   - Create given MGID=0xFF12A01C,FE800000,00000000,12345678 (o15.0.1.4)
+   - Create BAD MGID=0xFA. (o15.0.1.6)
+   - Create BAD MGID=0xFF12A01B w/ link-local not set (o15.0.1.6)
+   - New MGID with invalid join state (o15.0.1.9)
+   - Retry of existing MGID - See JoinState update (o15.0.1.11)
+   - BAD RATE when connecting to existing MGID (o15.0.1.13)
+   - Partial JoinState delete request - removing FullMember (o15.0.1.14)
+   - Full Delete of a group (o15.0.1.14)
+   - Verify Delete by trying to Join deleted group (o15.0.1.14)
+   - BAD Delete of IPoIB membership (no prev join) (o15.0.1.15)
+
+* GUIDInfo Record:
+   - All GUIDInfoRecords in subnet are obtained
+
+* MultiPathRecord:
+   - Perform some compliant and noncompliant MultiPathRecord requests
+   - Validation is via status in responses and IB analyzer
+
+* PKeyTableRecord:
+  - Perform some compliant and noncompliant PKeyTableRecord queries
+  - Validation is via status in responses and IB analyzer
+
+* LinearForwardingTableRecord:
+  - Perform some compliant and noncompliant LinearForwardingTableRecord queries
+  - Validation is via status in responses and IB analyzer
+
+* Event Forwarding: Register for trap forwarding using reports
+   - Send a trap and wait for report
+   - Unregister non-existing
+
+* Trap 64/65 Flow: Register to Trap 64-65, create traps (by
+  disconnecting/connecting ports) and wait for report, then unregister.
+
+* Stress Test: send PortInfoRecord queries, both single and RMPP and
+  check for the rate of responses as well as their validity.
+
+
+5.2 IB Management Simulator OpenSM Test Flows:
+
+The simulator provides ability to simulate the SM handling of virtual
+topologies that are not limited to actual lab equipment availability.
+OpenSM was simulated to bring up clusters of up to 10,000 nodes. Daily
+regressions use smaller (16 and 128 nodes clusters).
+
+The following test flows are run on the IB management simulator:
+
+* Stability:
+  Up to 12 links from the fabric are randomly selected to drop packets
+  at drop rates up to 90%. The SM is required to succeed in bringing the
+  fabric up. The resulting routing is verified to be correct as well.
+
+* LID Manager:
+  Using LMC = 2 the fabric is initialized with LIDs. Faults such as
+  zero LID, Duplicated LID, non-aligned (to LMC) LIDs are
+  randomly assigned to various nodes and other errors are randomly
+  output to the guid2lid cache file. The SM sweep is run 5 times and
+  after each iteration a complete verification is made to ensure that all
+  LIDs that could possibly be maintained are kept, as well as that all nodes
+  were assigned a legal LID range.
+
+* Multicast Routing:
+  Nodes randomly join the 0xc000 group and eventually the
+  resulting routing is verified for completeness and adherence to
+  Up/Down routing rules.
+
+* osmtest:
+  The complete osmtest flow as described in the previous table is run on
+  the simulated fabrics.
+
+* Stress Test:
+  This flow merges fabric, LID and stability issues with continuous
+  PathRecord, ServiceRecord and Multicast Join/Leave activity to
+  stress the SM/SA during continuous sweeps. InformInfo Set/Delete/Get
+  were added to the test such both existing and non existing nodes
+  perform them in random order.
+
+5.3 OpenSM Regression
+
+Using a back-to-back or single switch connection, the following set of
+tests is run nightly on the stacks described in table 2. The included
+tests are:
+
+* Stress Testing: Flood the SA with queries from multiple channel
+  adapters to check the robustness of the entire stack up to the SA.
+
+* Dynamic Changes: Dynamic Topology changes, through randomly
+  dropping SMP packets, used to test OpenSM adaptation to an unstable
+  network & verify DB correctness.
+
+* Trap Injection: This flow injects traps to the SM and verifies that it
+  handles them gracefully.
+
+* SA Query Test: This test exhaustively checks the SA responses to all
+  possible single component mask. To do that the test examines the
+  entire set of records the SA can provide, classifies them by their
+  field values and then selects every field (using component mask and a
+  value) and verifies that the response matches the expected set of records.
+  A random selection using multiple component mask bits is also performed.
+
+5.4 Cluster testing:
+
+Cluster testing is usually run before a distribution release. It
+involves real hardware setups of 16 to 32 nodes (or more if a beta site
+is available). Each test is validated by running all-to-all ping through the IB
+interface. The test procedure includes:
+
+* Cluster bringup
+
+* Hand-off between 2 or 3 SM's while performing:
+  - Node reboots
+  - Switch power cycles (disconnecting the SM's)
+
+* Unresponsive port detection and recovery
+
+* osmtest from multiple nodes
+
+* Trap injection and recovery
+
+
+6 Qualification
+----------------
+
+Table 2 - Qualified IB Stacks
+=============================
+
+Stack                                    | Version
+-----------------------------------------|--------------------------
+OFED                                     |   1.3
+OFED                                     |   1.2
+OFED                                     |   1.1
+OFED                                     |   1.0
+OpenIB Gen2 (IBG2 distribution)          |   1.0
+OpenIB Gen1 (IBGD distribution)          |   1.8.0
+VAPI (Mellanox InfiniBand HCA Driver)    |   3.2 and later
+
+Table 3 - Qualified Devices and Corresponding Firmware
+======================================================
+
+Mellanox
+Device                              |   FW versions
+------------------------------------|-------------------------------
+InfiniScale                         | fw-43132  5.2.000 (and later)
+InfiniScale III                     | fw-47396  0.5.000 (and later)
+InfiniHost                          | fw-23108  3.5.000 (and later)
+InfiniHost III Lx                   | fw-25204  1.2.000 (and later)
+InfiniHost III Ex (InfiniHost Mode) | fw-25208  4.8.200 (and later)
+InfiniHost III Ex (MemFree Mode)    | fw-25218  5.3.000 (and later)
+ConnectX IB                         | fw-25408  2.3.000 (and later)
+
+QLogic/PathScale
+Device  |   Note
+--------|-----------------------------------------------------------
+iPath   | QHT6040 (PathScale InfiniPath HT-460)
+iPath   | QHT6140 (PathScale InfiniPath HT-465)
+iPath   | QLE6140 (PathScale InfiniPath PE-880)
+iPath   | QLE7240
+iPath   | QLE7280
+
+Note 1: OpenSM does not run on an IBM Galaxy (eHCA) as it does not expose
+QP0 and QP1. However, it does support it as a device on the subnet.
+
+Note 2: QoS firmware and Mellanox devices
+
+HCAs: QoS supported by ConnectX. The current FW release
+doesn't support QoS. QoS-enabled FW release (2_5_000) is
+planned for May. If someone wishes to get QoS-enabled FW
+before the official release, they should contact Mellanox FAE.
+
+Switches: QoS supported by InfiniScale III
+Any InfiniScale III FW that is supported by OpenSM supports QoS.





More information about the general mailing list