[openib-general] QoS in OSM

Sasha Khapyorsky sashak at voltaire.com
Mon Jan 29 15:53:25 PST 2007


On 01:10 Tue 30 Jan     , Yevgeny Kliteynik wrote:
> Hi guys.
> 
> I've finished the first implementation of QoS-aware PathRecord.
> The path selection logic itself is implemented in a separate function
> that is called only when QoS in OpenSM is on.
> It cases some code duplication, but as we've discussed, the idea is to
> minimize the changes in the existing logic in OSM.
> Tonight the regression testing is running on this OSM version to make 
> sure that I didn't screw something up.
> Since none of the QoS patches has made its way to the trunk yet, the
> patch series will be pretty long. It will include:
>  - QoS policy file parser (Lex & Yacc files that implement grammar, 
>    C & H files that implements parser auxiliary functions)
>  - Additional fields is path_record_t (instead of 'reserved' fields)
>  - Additional command line option for OpenSM to specify the QoS 
>    policy file name
>  - QoS-aware selection of PathRecord.
> I'll issue the patch series with all the details in the morning, and then
> I'll start working on MultiPath Record.

And what about integration with VLArb and SL2VL port's setup?

Sasha

> 
> In addition to all the questions that you already have and I haven't answered
> yet, I'm sure you'll have many questions and remarks regarding these patches.
> 
> I suggest that we set up a conference call to discuss all these questions - it
> might save us a lot of time and clear some issues.
> 
> How about tomorrow morning? (I mean Hal's morning). The earlier the better.
> 
> Please let me know what you think about it.
> 
> Thanks,
> 
> -- Yevgeny
> 
> Hal Rosenstock wrote:
> > Hi again Yevgeny,
> > 
> > On Thu, 2007-01-25 at 11:53, Yevgeny Kliteynik wrote: 
> >> Hi Hal.
> >>
> >> Hal Rosenstock wrote:
> >>> Hi Yevgeny,
> >>>
> >>> On Wed, 2007-01-24 at 09:10, Yevgeny Kliteynik wrote:
> >>>> Hi Hal, Sasha.
> >>>>
> >>>> Here's a description of the QoS policy file, and an
> >>>> example of such file (with more comments inside).
> >>> This makes the start of a good document on this. If you add this to
> >>> osm/doc, I will incorporate it into the opensm man page.
> >> OK, I'll do that.
> >>
> >>>> QoS Policy file
> >>>> --
> >>>>
> >>>> The QoS policy file is divided into 4 sub sections:
> >>>>
> >>>> * Node Group: a set of HCAs, Routers or Switches that share the same settings. 
> >>>>   A node groups might be a partition defined by the partition manager policy in 
> >>>>   terms of GUIDs.
> >>> Are these Node or Port Groups ? It looks like port groups from the
> >>> below.
> >> Good point - it should be "Port Groups".
> >>
> >>>>  Future implementations might provide support for NodeDescription 
> >>>>   based definition of node groups.
> >>>>
> >>>> * Fabric Setup: 
> >>>>   Defines how the SL2VL and VLArb tables should be setup. This policy definition 
> >>>>   assumes the computation of target behavior should be performed outside of 
> >>>>   OpenSM.
> >>>>
> >>>> * QoS-Levels Definition:
> >>>>   This section defines the possible sets of parameters for QoS that a client might 
> >>>>   be mapped to. Each set holds: SL and optionally: Max MTU, Max Rate, Path Bits 
> >>>>   (in case LMC > 0 is used for QoS) and TClass.
> >>> How does this relate to/interact with partition configuration ? Also,
> >>> what about preexisting QoS ?
> >> As I understand from the osm man or from the partition-config.txt,
> >> partitions definition is intended to be used for IPoIB only.
> >>    [quote]
> >> 	sl=<val> - specifies SL for this IPoIB MC group
> >>         	   (default is 0)
> >>    [/quote]
> >>
> >> I think that QoS policy may only "tighten" the constraints and enforce
> >> lower-than-requested values, both in case of partition and in case of
> >> preexisting QoS settings.
> > 
> > I'm not following you on this specific point. A specific SL is chosen by
> > partition config so how can it be "tightened" ? Does it mean it might be
> > changed to a different SL (in which case this QoS config superceeds the
> > partition config for SL setting) ? Have you tried this to be sure ? 
> > 
> > Are multicast groups handled as part QoS definition in the XML syntax ?
> > If not, might this be a future addition ? If it is, how are they
> > specified ?
> > 
> > The other half of the original question was how a QoS request is handled
> > if the original QoS support is enabled rather than this new QoS support
> > in terms of the SA PR and MPR code.
> > 
> >>>> * Matching Rules:
> >>>>   A list of rules that match an incoming PathRecord request to a QoS-Level. The 
> >>>>   rules are processed in order such as the first match is applied. Each rule is 
> >>>>   built out of set of match expressions which should all match for the rule to 
> >>>>   apply. The matching expressions are defined for the following fields
> >>>>     - SRC and DST to lists of node groups
> >>>>     - Service-ID to a list of Service-ID or Service-ID ranges
> >>>>     - TClass to a list of TClass values or ranges
> >>>>
> >>>> QoS policy file example
> >>>> --
> >>>>
> >>>> <?xml version="1.0" encoding="ISO-8859-1"?>
> >>>> <qos-policy>
> >>>>     <!-- Port Groups define sets of ports to be used later in the settings -->
> >>>>     <port-groups>
> >>>>         <!-- using port GUIDs -->
> >>>>         <port-group> 
> >>>>             <name>Storage</name> 
> >>>>             <use>our SRP storage targets</use>
> >>> Is the use clause more than commentary ? How is it "used" ?
> >> The 'use' clause is just a description of the port group that
> >> can be used for logging. Other than for logging, it is just a
> >> commentary. 
> >>  
> >>>>             <port-guid>0x1000000000000001</port-guid>
> >>>>             <port-guid>0x1000000000000002</port-guid>
> >>>>         </port-group>
> >>>>         <!-- using names obtained by concatenation of first 2 words of NodeDescription
> >>>>              and port number -->
> >>>>         <port-group> 
> >>>>             <name>Virtual Servers</name> 
> >>>>             <use>node desc and IB port #</use>
> >>>>             <port-name>vs1/HCA-1/P1</port-name>
> >>>>             <port-name>vs3/HCA-1/P1</port-name>
> >>>>             <port-name>vs3/HCA-2/P1</port-name>
> >>> How are port-names used ?
> >> The syntax of the port name is as follows:
> >> "hostname/CA-num/Pnum"
> > 
> > What's it's purpose ? Is it used somewhere else in the syntax ?
> > 
> >>>>         </port-group>
> >>>>         <!-- using partitions defined in the partition policy -->
> >>>>         <port-group> 
> >>>>             <name>Partition 1</name> 
> >>>>             <use>default settings</use>
> >>>>             <partition>Part1</partition> 
> >>>>         </port-group>
> >>>>         <!-- using node types HCA|ROUTER|SWITCH -->
> >>> Is this CA rather than HCA ? (What about TCAs ?)
> >> Sure, it should be 'CA'.
> > 
> > Will this be changed ? If so, when ?
> > 
> >>>>         <port-group> 
> >>>>             <name>Routers</name> 
> >>>>             <use>all routers</use>
> >>>>             <node-type>ROUTER</node-type> 
> >>>>         </port-group>  
> >>>>     </port-groups>
> >>>>     
> >>>>     <qos-setup>
> >>>>     <!-- define all types of SL2VL tables always have 16 VL entries -->
> >>>                                                            ^^
> >>> Actually, it is                                            SL
> >>> assuming the device supports SL2VL mapping as indicate by
> >>> IsSLMappingSupported in the PortInfo:CapabilityMask.
> >>> Will the syntax handle single data VL devices which only implement SL
> >>> filtering ? 
> >> Yes, it should.
> >>
> >>> Will the QoS manager support this (SL2VL without VLArb
> >>> settings) or are these required together ?
> >> Yes, it should support sl2vl w/o vlarb settings as well.
> >>
> >>>>         <sl2vl-tables>
> >>>>         <!-- scope defines the exact devices and in/out ports the tables apply to
> >>>>              if the same port is matching several rules the last one applies -->
> >>>>             <sl2vl-scope> 
> >>>>                 <group>Part1</group> 
> >>>>                 <from>*</from> 
> >>>>                 <to>*</to> 
> >>>>                 <sl2vl-table>0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,7</sl2vl-table>
> >>>>             </sl2vl-scope>
> >>>>             <!-- also the link across port 1 is probably supporting only 2 VLs -->
> >>>>             <sl2vl-scope> 
> >>>>                 <across>Storage</across> 
> >>>>                 <!-- "across-from" means the port just connected to the given group -->
> >>>>                 <across-from>Storage2</across-from>
> >>>>                 <!-- "across-to" means the port just connected *to* the given group -->
> >>>>                 <across-to>Storage3</across-to>
> >>> I don't quite follow across-from/to.
> >> Right, the comments there are garbage. Here the explanation:
> >> SL2VL table describes VL as function of from-port, to-port, and SL.
> >>
> >> <across-to>group_name</across-to>:
> >>    It defines sl2vl table where 'to-port's belong to group_name
> >> <across-from>group_name</across-from>:
> >>    Same as above, only that this time 'from-port's belong to group_name
> >> <across>group_name</across>:
> >>    sl2vl tables both for 'to-port's 'from-port's that belong to group_name
> > 
> > I'm still not following what is going on here and how this is used.
> > 
> >>>>                 <from>*</from> 
> >>>>                 <to>1</to>
> >>>>                 <sl2vl-table>0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0</sl2vl-table>
> >>>>             </sl2vl-scope>
> >>>>         </sl2vl-tables>
> >>>>
> >>>>         <!-- define all types of VLArb tables. The length of the tables should 
> >>>>              match the physically supported tables by their target ports -->
> >>>>         <vlarb-tables>
> >>>>             <!-- scope defines the exact ports the VLArb tables apply to -->
> >>>>             <vlarb-scope> 
> >>>>                 <group>Storage</group>
> >>>>                 <!-- VLArb table holds VL and weight pairs -->
> >>>>                 <vlarb-high>0:255,1:127,2:63,3:31,4:15,5:7,6:3,7:1</vlarb-high>
> >>>>                 <vlarb-low>8:255,9:127,10:63,11:31,12:15,13:7,14:3</vlarb-low>
> >>>>                 <vl-high-limit>10</vl-high-limit>
> >>> What happens if the shape of VLArb indicated here does not match the
> >>> device ?
> >> The part that sets up the QoS in SM (I'm not writing this part right now),
> > 
> > What is the plan for this ?
> > 
> >> should issue error message in case VLArb definition doesn't match the device
> >> properties.
> > 
> > Aside from the error message, is there any additional error handling for
> > this ?
> > 
> >>>>             </vlarb-scope>
> >>>>         </vlarb-tables>
> >>>>     </qos-setup>
> >>>>
> >>>>     <qos-levels>
> >>>>         <!-- the first one is just setting SL -->
> >>>>         <qos-level> 
> >>>>             <sn>1</sn> 
> >>> What does sn mean ? What is it used for ?
> >> 'sn' is an id of this qos level definition.
> >> It is referenced later in by QoS match rules as 'qos-level-sn'
> > 
> > What is 'sn' short for ?
> > 
> >>>>             <use>for the lowest priority comm</use>
> >>>>             <sl>16</sl>
> >>>>         </qos-level>
> >>>>         <!-- the second sets SL and TClass -->
> >>>>         <qos-level> 
> >>>>             <sn>2</sn> 
> >>>>             <use>low latency best bandwidth</use>
> >>>>             <sl>0</sl> 
> >>>>             <class>7</class>
> >>> What is class ? I saw TClass mentioned earlier. Is this TClass or
> >>> something else ?
> >> Instead of "TClass" there should be "QoS Class".
> >> The <class> value is the PathRecord.qos_class value that should be
> >> returned in the path record query response when a certain <qos-level>
> >> is applied to the returned path.
> > 
> > So these names need to change to be more consistent ?
> > 
> >>>>         </qos-level>
> >>>>         <!-- the whole set: SL, TClass, MTU-Limit, Rate-Limit, Path-Bits  -->
> >>> If specified, do MTU limit and rate limit add extra limits to be imposed
> >>> on what is selected (and realizable) ?
> >> Yes
> >>
> >>> Strictly speaking, couldn't packet lifetime limit also be added to this
> >>> syntax here ? I presume it was left out as being not "interesting" as
> >>> yet. Is that correct ?
> >> I can add packet lifetime limit - it's not a big deal
> >>
> >>> Also, how are path bits used ?
> >> For now I don't do anything with them - we'll discuss this issue in the future.
> > 
> > How are they envisioned to be used ? 
> > 
> > Why are they in the syntax now ? Seems inconsistent with PLL.
> > 
> > Should there be a warning if they are specified now since they are not
> > used ? 
> >  
> >>>>         <qos-level> 
> >>>>             <sn>3</sn> 
> >>>>             <use>just an example</use>
> >>>>             <sl>0</sl> 
> >>>>             <class>32</class> 
> >>>>             <mtu-limit>1</mtu-limit> 
> >>>>             <rate-limit>1</rate-limit>
> >>>>         </qos-level>
> >>>>     </qos-levels>
> >>>>
> >>>>     <!-- Match rules are scanned in a first-fit manner (like firewall rules table) -->
> >>>>     <qos-match-rules>
> >>>>         <!-- matching by single criteria: class (list of values and ranges) -->
> >>>>         <qos-match-rule> 
> >>>>             <qos-level-sn>1</qos-level-sn> <!-- defined in <sn> of <qos-level> -->
> >>>>             <use>low latency by class 7-9 or 11</use> <!-- just a description -->
> >>>>             <class>7-9,11</class> <!--  -->
> >>>>             <match-level>1</match-level> <!-- ID of this match rule -->
> >>>>         </qos-match-rule>
> >>>>         <!-- show matching by destination group AND service-ids -->
> >>>>         <qos-match-rule> 
> >>>>             <qos-level-sn>2</qos-level-sn> 
> >>>>             <use>Storage targets connection></use>
> >>>>             <destination>Storage</destination>
> >>>>             <service>22,4719</service>
> >>> What is service ? What does 22.4719 mean ?
> >> The syntax is <service>service_id1,service_id1,...</service>, so in the
> >> example above these are actually two service ids.
> > 
> > So you can create arbitrary lists of service IDs. What about ranges ?
> > Does the syntax support that ?
> > 
> >> As for the exact meaning of this, I'm not sure - I need to think about it...
> > 
> > Let me know. I'd really like to understand the syntax.
> > 
> >>>>             <match-level>3</match-level> 
> >>> What are match-levels used for ?
> >> Actually, they are not used - they shouldn't appear here.
> >> Somehow it was copy-pasted here from one of the older versions
> >> of the policy file.
> > 
> > So can this be updated for what is current ?
> > 
> > Thanks.
> > 
> > -- Hal
> > 
> >> -- Yevgeny
> >>
> >>> -- Hal
> >>>
> >>>>         </qos-match-rule>
> >>>>     </qos-match-rules>
> >>>>
> >>>> </qos-policy>
> >>>>
> >>>>
> >>>>
> >>>> -- Yevgeny
> >>>>
> >>>> Yevgeny Kliteynik wrote:
> >>>>> Hi Sasha,
> >>>>>
> >>>>> Sasha Khapyorsky wrote:
> >>>>>> On 10:46 Sun 21 Jan     , Yevgeny Kliteynik wrote:
> >>>>>>> Hi Sasha.
> >>>>>>>
> >>>>>>> Sasha Khapyorsky wrote:
> >>>>>>>> Hi Yevgeny,
> >>>>>>>>
> >>>>>>>> On 17:01 Wed 17 Jan     , Yevgeny Kliteynik wrote:
> >>>>>>>>> Hi Hal
> >>>>>>>>>
> >>>>>>>>> The following series of six patches implements QoS policy file parser:
> >>>>>>>>>
> >>>>>>>>> 1. QoS parser Lex file
> >>>>>>>>> 2. QoS parser Lex-generated c file
> >>>>>>>>> 3. QoS parser grammar (Yacc) file
> >>>>>>>>> 4. QoS parser Yacc-generated grammar c and h file
> >>>>>>>>> 5. QoS parser header file that defines parse tree data structures 
> >>>>>>>>> 6. Changes in makefiles and configure.in file for compiling QoS parser files
> >>>>>>>> Is there any description of proposed format and functionality?
> >>>>>>> The parser is based on QoS RFC sent by Eitan in May 2006, with a few
> >>>>>>> minor modifications. You can find the RFC here:
> >>>>>>> http://openib.org/pipermail/openib-general/2006-May/022336.html
> >>>>>> This was RFC and couple of issues were discussed then. Now you are about
> >>>>>> implementation phase and exact format description would be desired. For
> >>>>>> example what "few minor modifications" are?
> >>>>> I'll prepare an example file with explanations.
> >>>>>
> >>>>> -- Yevgeny
> >>>>>
> >>>>>>>> Also what about using human readable formats?
> >>>>>>> To me the xml-like format in the RFC looks pretty readable.
> >>>>>>> It has very limited number of keywords (tags), so it's easy 
> >>>>>>> to follow and/or to modify.
> >>>>>> It is your opinion, not everybody will agree with it (AFAIR this was
> >>>>>> discussed too during RFC).
> >>>>>>
> >>>>>> I would not be care, but I don't know any example of really successful
> >>>>>> XML using for configuration purposes (especially where advanced graphical
> >>>>>> config editors/viewers were not used). Do you know?
> >>>>>>
> >>>>>> Sasha
> >>>>>>
> >>>>> _______________________________________________
> >>>>> openib-general mailing list
> >>>>> openib-general at openib.org
> >>>>> http://openib.org/mailman/listinfo/openib-general
> >>>>>
> >>>>> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> >>>>>
> > 




More information about the general mailing list