[ofa-general] QoS management in OpenSM - doc

Yevgeny Kliteynik kliteyn at dev.mellanox.co.il
Wed Feb 27 06:42:40 PST 2008


Hi Sasha,

The following doc describes QoS management in OpenSM.
This doc (named QoS_management_in_OpenSM.txt) has been added to
the OFED docs, along with the QoS_in_OFED.txt.

I'd like to add this info to OpenSM man pages as well.

I'm including the text here as is, so it will be easier to follow
possible changes. When those will be done, I'll fix the format to
match the OpenSM man pages and post a patch.

The only problem is that the whole OpenSM man has ~850 lines,
while this QoS management file has ~500 lines... :)

Please review.

-- Yevgeny

-------------------------------


                     QoS Management in OpenSM

==============================================================================
  Table of contents
==============================================================================

1. Overview
2. Full QoS Policy File
3. Simplified QoS Policy Definition
4. Policy File Syntax Guidelines
5. Examples of Full Policy File
6. Simplified QoS Policy - Details and Examples
7. SL2VL Mapping and VL Arbitration


==============================================================================
  1. Overview
==============================================================================

When QoS in OpenSM is enabled (-Q or --qos), OpenSM looks for QoS Policy file.
The default name of OpenSM QoS policy file is
/usr/local/etc/opensm/qos-policy.conf. The default may be changed by using -Y
or --qos_policy_file option with OpenSM.

During fabric initialization and at every heavy sweep OpenSM parses the QoS
policy file, applies its settings to the discovered fabric elements, and
enforces the provided policy on client requests. The overall flow for such
requests is:
  - The request is matched against the defined matching rules such that the
    QoS Level definition is found.
  - Given the QoS Level, path(s) search is performed with the given
    restrictions imposed by that level.

There are two ways to define QoS policy:
  - Full policy, where the policy file syntax provides an administrator
    various ways to match PathRecord/MultiPathRecord (PR/MPR) request and
    enforce various QoS constraints on the requested PR/MPR
  - Simplified QoS policy definition, where an administrator would be able to
    match PR/MPR requests by various ULPs and applications running on top of
    these ULPs.

While the full policy syntax is very flexible, in many cases the simplified
policy definition would be sufficient.


==============================================================================
  2. Full QoS Policy File
==============================================================================

QoS policy file has the following sections:

I) Port Groups (denoted by port-groups).
This section defines zero or more port groups that can be referred later by
matching rules (see below). Port group lists ports by:
   - Port GUID
   - Port name, which is a combination of NodeDescription and IB port number
   - PKey, which means that all the ports in the subnet that belong to
     partition with a given PKey belong to this port group
   - Partition name, which means that all the ports in the subnet that belong
     to partition with a given name belong to this port group
   - Node type, where possible node types are: CA, SWITCH, ROUTER, ALL, and
     SELF (SM's port).

II) QoS Setup (denoted by qos-setup).
This section describes how to set up SL2VL and VL Arbitration tables on
various nodes in the fabric.
However, this is not supported in OFED 1.3.
SL2VL and VLArb tables should be configured in the OpenSM options file
(default location - /var/cache/opensm/opensm.opts).

III) QoS Levels (denoted by qos-levels).
Each QoS Level defines Service Level (SL) and a few optional fields:
   - MTU limit
   - Rate limit
   - PKey
   - Packet lifetime
When path(s) search is performed, it is done with regards to restriction that
these QoS Level parameters impose.
One QoS level that is mandatory to define is a DEFAULT QoS level. It is
applied to a PR/MPR query that does not match any existing match rule.
Similar to any other QoS Level, it can also be explicitly referred by any
match rule.

IV) QoS Matching Rules (denoted by qos-match-rules).
Each PathRecord/MultiPathRecord query that OpenSM receives is matched against
the set of matching rules. Rules are scanned in order of appearance in the QoS
policy file such as the first match takes precedence.
Each rule has a name of QoS level that will be applied to the matching query.
A default QoS level is applied to a query that did not match any rule.
Queries can be matched by:
  - Source port group (whether a source port is a member of a specified group)
  - Destination port group (same as above, only for destination port)
  - PKey
  - QoS class
  - Service ID
To match a certain matching rule, PR/MPR query has to match ALL the rule's
criteria. However, not all the fields of the PR/MPR query have to appear in
the matching rule.
For instance, if the rule has a single criterion - Service ID, it will match
any query that has this Service ID, disregarding rest of the query fields.
However, if a certain query has only Service ID (which means that this is the
only bit in the PR/MPR component mask that is on), it will not match any rule
that has other matching criteria besides Service ID.


==============================================================================
  3. Simplified QoS Policy Definition
==============================================================================

Simplified QoS policy definition comprises of a single section denoted by
qos-ulps. Similar to the full QoS policy, it has a list of match rules and
their QoS Level, but in this case a match rule has only one criterion - its
goal is to match a certain ULP (or a certain application on top of this ULP)
PR/MPR request, and QoS Level has only one constraint - Service Level (SL).
The simplified policy section may appear in the policy file in combine with
the full policy, or as a stand-alone policy definition.
See more details and list of match rule criteria below.


==============================================================================
  4. Policy File Syntax Guidelines
==============================================================================

- Empty lines are ignored.
- Leading and trailing blanks, as well as empty lines, are ignored, so
   the indentation in the example is just for better readability.
- Comments are started with the pound sign (#) and terminated by EOL.
- Any keyword should be the first non-blank in the line, unless it's a
   comment.
- Keywords that denote section/subsection start have matching closing
   keywords.
- Having a QoS Level named "DEFAULT" is a must - it is applied to PR/MPR
   requests that didn't match any of the matching rules.
- Any section/subsection of the policy file is optional.


==============================================================================
  5. Examples of Full Policy File
==============================================================================

As mentioned earlier, any section of the policy file is optional, and
the only mandatory part of the policy file is a default QoS Level.
Here's an example of the shortest policy file:

     qos-levels
         qos-level
             name: DEFAULT
             sl: 0
         end-qos-level
     end-qos-levels

Port groups section is missing because there are no match rules, which means
that port groups are not referred anywhere, and there is no need defining
them. And since this policy file doesn't have any matching rules, PR/MPR query
won't match any rule, and OpenSM will enforce default QoS level.
Essentially, the above example is equivalent to not having QoS policy file
at all.

The following example shows all the possible options and keywords in the
policy file and their syntax:

     #
     # See the comments in the following example.
     # They explain different keywords and their meaning.
     #
     port-groups

         port-group # using port GUIDs
             name: Storage
             # "use" is just a description that is used for logging
             #  Other than that, it is just a comment
             use: SRP Targets
             port-guid: 0x10000000000001, 0x10000000000005-0x1000000000FFFA
             port-guid: 0x1000000000FFFF
         end-port-group

         port-group
             name: Virtual Servers
             # The syntax of the port name is as follows:
             #   "node_description/Pnum".
             # node_description is compared to the NodeDescription of the node,
             # and "Pnum" is a port number on that node.
             port-name: vs1 HCA-1/P1, vs2 HCA-1/P1
         end-port-group

         # using partitions defined in the partition policy
         port-group
             name: Partitions
             partition: Part1
             pkey: 0x1234
         end-port-group

         # using node types: CA, ROUTER, SWITCH, SELF (for node that runs SM)
         # or ALL (for all the nodes in the subnet)
         port-group
             name: CAs and SM
             node-type: CA, SELF
         end-port-group

     end-port-groups

     qos-setup
         # This section of the policy file describes how to set up SL2VL and VL
         # Arbitration tables on various nodes in the fabric.
         # However, this is not supported in OFED 1.3 - the section is parsed
         # and ignored. SL2VL and VLArb tables should be configured in the
         # OpenSM options file (by default - /var/cache/opensm/opensm.opts).
     end-qos-setup

     qos-levels

         # Having a QoS Level named "DEFAULT" is a must - it is applied to
         # PR/MPR requests that didn't match any of the matching rules.
         qos-level
             name: DEFAULT
             use: default QoS Level
             sl: 0
         end-qos-level

         # the whole set: SL, MTU-Limit, Rate-Limit, PKey, Packet Lifetime
         qos-level
             name: WholeSet
             sl: 1
             mtu-limit: 4
             rate-limit: 5
             pkey: 0x1234
             packet-life: 8
         end-qos-level

     end-qos-levels

     # Match rules are scanned in order of their apperance in the policy file.
     # First matched rule takes precedence.
     qos-match-rules

         # matching by single criteria: QoS class
         qos-match-rule
             use: by QoS class
             qos-class: 7-9,11
             # Name of qos-level to apply to the matching PR/MPR
             qos-level-name: WholeSet
         end-qos-match-rule

         # show matching by destination group and service id
         qos-match-rule
             use: Storage targets
             destination: Storage
             service-id: 0x10000000000001, 0x10000000000008-0x10000000000FFF
             qos-level-name: WholeSet
         end-qos-match-rule

         qos-match-rule
             source: Storage
             use: match by source group only
             qos-level-name: DEFAULT
         end-qos-match-rule

         qos-match-rule
             use: match by all parameters
             qos-class: 7-9,11
             source: Virtual Servers
             destination: Storage
             service-id: 0x0000000000010000-0x000000000001FFFF
             pkey: 0x0F00-0x0FFF
             qos-level-name: WholeSet
         end-qos-match-rule

     end-qos-match-rules


==============================================================================
  6. Simplified QoS Policy - Details and Examples
==============================================================================

Simplified QoS policy match rules are tailored for matching ULPs (or some
application on top of a ULP) PR/MPR requests. This section has a list of
per-ULP (or per-application) match rules and the SL that should be enforced
on the matched PR/MPR query.

Match rules include:
  - Default match rule that is applied to PR/MPR query that didn't match any
    of the other match rules
  - SDP
  - SDP application with a specific target TCP/IP port range
  - SRP with a specific target IB port GUID
  - RDS
  - iSER
  - iSER application with a specific target TCP/IP port range
  - IPoIB with a default PKey
  - IPoIB with a specific PKey
  - any ULP/application with a specific Service ID in the PR/MPR query
  - any ULP/application with a specific PKey in the PR/MPR query
  - any ULP/application with a specific target IB port GUID in the PR/MPR query

Since any section of the policy file is optional, as long as basic rules of
the file are kept (such as no referring to nonexisting port group, having
default QoS Level, etc), the simplified policy section (qos-ulps) can serve
as a complete QoS policy file.
The shortest policy file in this case would be as follows:

     qos-ulps
         default  : 0 #default SL
     end-qos-ulps

It is equivalent to the previous example of the shortest policy file, and it
is also equivalent to not having policy file at all.

Below is an example of simplified QoS policy with all the possible keywords:

     qos-ulps
         default                       : 0 # default SL
         sdp, port-num 30000           : 0 # SL for application running on top
                                           # of SDP when a destination
                                           # TCP/IPport is 30000
         sdp, port-num 10000-20000     : 0
         sdp                           : 1 # default SL for any other
                                           # application running on top of SDP
         rds                           : 2 # SL for RDS traffic
         iser, port-num 900            : 0 # SL for iSER with a specific target
                                           # port
         iser                          : 3 # default SL for iSER
         ipoib, pkey 0x0001            : 0 # SL for IPoIB on partition with
                                           # pkey 0x0001
         ipoib                         : 4 # default IPoIB partition,
                                           # pkey=0x7FFF
         any, service-id 0x6234        : 6 # match any PR/MPR query with a
                                           # specific Service ID
         any, pkey 0x0ABC              : 6 # match any PR/MPR query with a
                                           # specific PKey
         srp, target-port-guid 0x1234  : 5 # SRP when SRP Target is located on
                                           # a specified IB port GUID
         any, target-port-guid 0x0ABC-0xFFFFF : 6 # match any PR/MPR query with
                                           # a specific target port GUID
     end-qos-ulps


Similar to the full policy definition, matching of PR/MPR queries is done in
order of appearance in the QoS policy file such as the first match takes
precedence, except for the "default" rule, which is applied only if the query
didn't match any other rule.

All other sections of the QoS policy file take precedence over the qos-ulps
section. That is, if a policy file has both qos-match-rules and qos-ulps
sections, then any query is matched first against the rules in the
qos-match-rules section, and only if there was no match, the query is matched
against the rules in qos-ulps section.

Note that some of these match rules may overlap, so in order to use the
simplified QoS definition effectively, it is important to understand how each
of the ULPs is matched:

6.1  IPoIB
IPoIB query is matched by PKey. Default PKey for IPoIB partition is 0x7fff, so
the following three match rules are equivalent:

     ipoib            : <SL>
     ipoib, 0x7fff    : <SL>
     any, pkey 0x7fff : <SL>

6.2  SDP
SDP PR query is matched by Service ID. The Service-ID for SDP is
0x000000000001PPPP, where PPPP are 4 hex digits holding the remote TCP/IP Port
Number to connect to. The following two match rules are equivalent:

     sdp                                                   : <SL>
     any, service-id 0x0000000000010000-0x000000000001ffff : <SL>

6.3  RDS
Similar to SDP, RDS PR query is matched by Service ID. The Service ID for RDS
is 0x000000000106PPPP, where PPPP are 4 hex digits holding the remote TCP/IP
Port Number to connect to. Default port number for RDS is 0x48CA, which makes
a default Service-ID 0x00000000010648CA. The following two match rules are
equivalent:

     rds                                : <SL>
     any, service-id 0x00000000010648CA : <SL>

6.4  iSER
Similar to RDS, iSER query is matched by Service ID, where the the Service ID
is also 0x000000000106PPPP. Default port number for iSER is 0x035C, which makes
a default Service-ID 0x000000000106035C. The following two match rules are
equivalent:

     iser                               : <SL>
     any, service-id 0x000000000106035C : <SL>

6.5  SRP
Service ID for SRP varies from storage vendor to vendor, thus SRP query is
matched by the target IB port GUID. The following two match rules are
equivalent:

     srp, target-port-guid 0x1234  : <SL>
     any, target-port-guid 0x1234  : <SL>

Note that any of the above ULPs might contain target port GUID in the PR
query, so in order for these queries not to be recognized by the QoS manager
as SRP, the SRP match rule (or any match rule that refers to the target port
guid only) should be placed at the end of the qos-ulps match rules.

6.6  MPI
SL for MPI is manually configured by MPI admin. OpenSM is not forcing any SL
on the MPI traffic, and that's why it is the only ULP that did not appear in
the qos-ulps section.


==============================================================================
  7. SL2VL Mapping and VL Arbitration
==============================================================================

OpenSM cached options file has a set of QoS related configuration parameters,
that are used to configure SL2VL mapping and VL arbitration on IB ports.
These parameters are:
  - Max VLs: the maximum number of VLs that will be on the subnet.
  - High limit: the limit of High Priority component of VL Arbitration
    table (IBA 7.6.9).
  - VLArb low table: Low priority VL Arbitration table (IBA 7.6.9) template.
  - VLArb high table: High priority VL Arbitration table (IBA 7.6.9) template.
  - SL2VL: SL2VL Mapping table (IBA 7.6.6) template. It is a list of VLs
    corresponding to SLs 0-15 (Note that VL15 used here means drop this SL).

There are separate QoS configuration parameters sets for various target types:
CAs, routers, switch external ports, and switch's enhanced port 0. The names
of such parameters are prefixed by "qos_<type>_" string. Here is a full list
of the currently supported sets:

     qos_ca_  - QoS configuration parameters set for CAs.
     qos_rtr_ - parameters set for routers.
     qos_sw0_ - parameters set for switches' port 0.
     qos_swe_ - parameters set for switches' external ports.

Here's the example of typical default values for CAs and switches' external
ports (hard-coded in OpenSM initialization):

     qos_ca_max_vls=15
     qos_ca_high_limit=0
     qos_ca_vlarb_high=0:4,1:0,2:0,3:0,4:0,5:0,6:0,7:0,8:0,9:0,10:0,11:0,12:0,13:0,14:0
     qos_ca_vlarb_low=0:0,1:4,2:4,3:4,4:4,5:4,6:4,7:4,8:4,9:4,10:4,11:4,12:4,13:4,14:4
     qos_ca_sl2vl=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,7

     qos_swe_max_vls=15
     qos_swe_high_limit=0
     qos_swe_vlarb_high=0:4,1:0,2:0,3:0,4:0,5:0,6:0,7:0,8:0,9:0,10:0,11:0,12:0,13:0,14:0
     qos_swe_vlarb_low=0:0,1:4,2:4,3:4,4:4,5:4,6:4,7:4,8:4,9:4,10:4,11:4,12:4,13:4,14:4
     qos_swe_sl2vl=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,7

VL arbitration tables (both high and low) are lists of VL/Weight pairs.
Each list entry contains a VL number (values from 0-14), and a weighting value
(values 0-255), indicating the number of 64 byte units (credits) which may be
transmitted from that VL when its turn in the arbitration occurs. A weight
of 0 indicates that this entry should be skipped. If a list entry is
programmed for VL15 or for a VL that is not supported or is not currently
configured by the port, the port may either skip that entry or send from any
supported VL for that entry.

Note, that the same VLs may be listed multiple times in the High or Low
priority arbitration tables, and, further, it can be listed in both tables.

The limit of high-priority VLArb table (qos_<type>_high_limit) indicates the
number of high-priority packets that can be transmitted without an opportunity
to send a low-priority packet. Specifically, the number of bytes that can be
sent is high_limit times 4K bytes.

A high_limit value of 255 indicates that the byte limit is unbounded.
Note: if the 255 value is used, the low priority VLs may be starved.
A value of 0 indicates that only a single packet from the high-priority table
may be sent before an opportunity is given to the low-priority table.

Keep in mind that ports usually transmit packets of size equal to MTU.
For instance, for 4KB MTU a single packet will require 64 credits, so in order
to achieve effective VL arbitration for packets of 4KB MTU, the weighting
values for each VL should be multiples of 64.

Below is an example of SL2VL and VL Arbitration configuration on subnet:

     qos_ca_max_vls=15
     qos_ca_high_limit=6
     qos_ca_vlarb_high=0:4
     qos_ca_vlarb_low=0:0,1:64,2:128,3:192,4:0,5:64,6:64,7:64
     qos_ca_sl2vl=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,7

     qos_swe_max_vls=15
     qos_swe_high_limit=6
     qos_swe_vlarb_high=0:4
     qos_swe_vlarb_low=0:0,1:64,2:128,3:192,4:0,5:64,6:64,7:64
     qos_swe_sl2vl=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,7

In this example, there are 8 VLs configured on subnet: VL0 to VL7. VL0 is
defined as a high priority VL, and it is limited to 6 x 4KB = 24KB in a single
transmission burst. Such configuration would suilt VL that needs low latency
and uses small MTU when transmitting packets. Rest of VLs are defined as low
priority VLs with different weights, while VL4 is effectively turned off.







More information about the general mailing list