[openib-general] QoS in OSM

Yevgeny Kliteynik kliteyn at dev.mellanox.co.il
Mon Jan 29 15:10:23 PST 2007


Hi guys.

I've finished the first implementation of QoS-aware PathRecord.
The path selection logic itself is implemented in a separate function
that is called only when QoS in OpenSM is on.
It cases some code duplication, but as we've discussed, the idea is to
minimize the changes in the existing logic in OSM.
Tonight the regression testing is running on this OSM version to make 
sure that I didn't screw something up.
Since none of the QoS patches has made its way to the trunk yet, the
patch series will be pretty long. It will include:
 - QoS policy file parser (Lex & Yacc files that implement grammar, 
   C & H files that implements parser auxiliary functions)
 - Additional fields is path_record_t (instead of 'reserved' fields)
 - Additional command line option for OpenSM to specify the QoS 
   policy file name
 - QoS-aware selection of PathRecord.
I'll issue the patch series with all the details in the morning, and then
I'll start working on MultiPath Record.

In addition to all the questions that you already have and I haven't answered
yet, I'm sure you'll have many questions and remarks regarding these patches.

I suggest that we set up a conference call to discuss all these questions - it
might save us a lot of time and clear some issues.

How about tomorrow morning? (I mean Hal's morning). The earlier the better.

Please let me know what you think about it.

Thanks,

-- Yevgeny

Hal Rosenstock wrote:
> Hi again Yevgeny,
> 
> On Thu, 2007-01-25 at 11:53, Yevgeny Kliteynik wrote: 
>> Hi Hal.
>>
>> Hal Rosenstock wrote:
>>> Hi Yevgeny,
>>>
>>> On Wed, 2007-01-24 at 09:10, Yevgeny Kliteynik wrote:
>>>> Hi Hal, Sasha.
>>>>
>>>> Here's a description of the QoS policy file, and an
>>>> example of such file (with more comments inside).
>>> This makes the start of a good document on this. If you add this to
>>> osm/doc, I will incorporate it into the opensm man page.
>> OK, I'll do that.
>>
>>>> QoS Policy file
>>>> --
>>>>
>>>> The QoS policy file is divided into 4 sub sections:
>>>>
>>>> * Node Group: a set of HCAs, Routers or Switches that share the same settings. 
>>>>   A node groups might be a partition defined by the partition manager policy in 
>>>>   terms of GUIDs.
>>> Are these Node or Port Groups ? It looks like port groups from the
>>> below.
>> Good point - it should be "Port Groups".
>>
>>>>  Future implementations might provide support for NodeDescription 
>>>>   based definition of node groups.
>>>>
>>>> * Fabric Setup: 
>>>>   Defines how the SL2VL and VLArb tables should be setup. This policy definition 
>>>>   assumes the computation of target behavior should be performed outside of 
>>>>   OpenSM.
>>>>
>>>> * QoS-Levels Definition:
>>>>   This section defines the possible sets of parameters for QoS that a client might 
>>>>   be mapped to. Each set holds: SL and optionally: Max MTU, Max Rate, Path Bits 
>>>>   (in case LMC > 0 is used for QoS) and TClass.
>>> How does this relate to/interact with partition configuration ? Also,
>>> what about preexisting QoS ?
>> As I understand from the osm man or from the partition-config.txt,
>> partitions definition is intended to be used for IPoIB only.
>>    [quote]
>> 	sl=<val> - specifies SL for this IPoIB MC group
>>         	   (default is 0)
>>    [/quote]
>>
>> I think that QoS policy may only "tighten" the constraints and enforce
>> lower-than-requested values, both in case of partition and in case of
>> preexisting QoS settings.
> 
> I'm not following you on this specific point. A specific SL is chosen by
> partition config so how can it be "tightened" ? Does it mean it might be
> changed to a different SL (in which case this QoS config superceeds the
> partition config for SL setting) ? Have you tried this to be sure ? 
> 
> Are multicast groups handled as part QoS definition in the XML syntax ?
> If not, might this be a future addition ? If it is, how are they
> specified ?
> 
> The other half of the original question was how a QoS request is handled
> if the original QoS support is enabled rather than this new QoS support
> in terms of the SA PR and MPR code.
> 
>>>> * Matching Rules:
>>>>   A list of rules that match an incoming PathRecord request to a QoS-Level. The 
>>>>   rules are processed in order such as the first match is applied. Each rule is 
>>>>   built out of set of match expressions which should all match for the rule to 
>>>>   apply. The matching expressions are defined for the following fields
>>>>     - SRC and DST to lists of node groups
>>>>     - Service-ID to a list of Service-ID or Service-ID ranges
>>>>     - TClass to a list of TClass values or ranges
>>>>
>>>> QoS policy file example
>>>> --
>>>>
>>>> <?xml version="1.0" encoding="ISO-8859-1"?>
>>>> <qos-policy>
>>>>     <!-- Port Groups define sets of ports to be used later in the settings -->
>>>>     <port-groups>
>>>>         <!-- using port GUIDs -->
>>>>         <port-group> 
>>>>             <name>Storage</name> 
>>>>             <use>our SRP storage targets</use>
>>> Is the use clause more than commentary ? How is it "used" ?
>> The 'use' clause is just a description of the port group that
>> can be used for logging. Other than for logging, it is just a
>> commentary. 
>>  
>>>>             <port-guid>0x1000000000000001</port-guid>
>>>>             <port-guid>0x1000000000000002</port-guid>
>>>>         </port-group>
>>>>         <!-- using names obtained by concatenation of first 2 words of NodeDescription
>>>>              and port number -->
>>>>         <port-group> 
>>>>             <name>Virtual Servers</name> 
>>>>             <use>node desc and IB port #</use>
>>>>             <port-name>vs1/HCA-1/P1</port-name>
>>>>             <port-name>vs3/HCA-1/P1</port-name>
>>>>             <port-name>vs3/HCA-2/P1</port-name>
>>> How are port-names used ?
>> The syntax of the port name is as follows:
>> "hostname/CA-num/Pnum"
> 
> What's it's purpose ? Is it used somewhere else in the syntax ?
> 
>>>>         </port-group>
>>>>         <!-- using partitions defined in the partition policy -->
>>>>         <port-group> 
>>>>             <name>Partition 1</name> 
>>>>             <use>default settings</use>
>>>>             <partition>Part1</partition> 
>>>>         </port-group>
>>>>         <!-- using node types HCA|ROUTER|SWITCH -->
>>> Is this CA rather than HCA ? (What about TCAs ?)
>> Sure, it should be 'CA'.
> 
> Will this be changed ? If so, when ?
> 
>>>>         <port-group> 
>>>>             <name>Routers</name> 
>>>>             <use>all routers</use>
>>>>             <node-type>ROUTER</node-type> 
>>>>         </port-group>  
>>>>     </port-groups>
>>>>     
>>>>     <qos-setup>
>>>>     <!-- define all types of SL2VL tables always have 16 VL entries -->
>>>                                                            ^^
>>> Actually, it is                                            SL
>>> assuming the device supports SL2VL mapping as indicate by
>>> IsSLMappingSupported in the PortInfo:CapabilityMask.
>>> Will the syntax handle single data VL devices which only implement SL
>>> filtering ? 
>> Yes, it should.
>>
>>> Will the QoS manager support this (SL2VL without VLArb
>>> settings) or are these required together ?
>> Yes, it should support sl2vl w/o vlarb settings as well.
>>
>>>>         <sl2vl-tables>
>>>>         <!-- scope defines the exact devices and in/out ports the tables apply to
>>>>              if the same port is matching several rules the last one applies -->
>>>>             <sl2vl-scope> 
>>>>                 <group>Part1</group> 
>>>>                 <from>*</from> 
>>>>                 <to>*</to> 
>>>>                 <sl2vl-table>0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,7</sl2vl-table>
>>>>             </sl2vl-scope>
>>>>             <!-- also the link across port 1 is probably supporting only 2 VLs -->
>>>>             <sl2vl-scope> 
>>>>                 <across>Storage</across> 
>>>>                 <!-- "across-from" means the port just connected to the given group -->
>>>>                 <across-from>Storage2</across-from>
>>>>                 <!-- "across-to" means the port just connected *to* the given group -->
>>>>                 <across-to>Storage3</across-to>
>>> I don't quite follow across-from/to.
>> Right, the comments there are garbage. Here the explanation:
>> SL2VL table describes VL as function of from-port, to-port, and SL.
>>
>> <across-to>group_name</across-to>:
>>    It defines sl2vl table where 'to-port's belong to group_name
>> <across-from>group_name</across-from>:
>>    Same as above, only that this time 'from-port's belong to group_name
>> <across>group_name</across>:
>>    sl2vl tables both for 'to-port's 'from-port's that belong to group_name
> 
> I'm still not following what is going on here and how this is used.
> 
>>>>                 <from>*</from> 
>>>>                 <to>1</to>
>>>>                 <sl2vl-table>0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0</sl2vl-table>
>>>>             </sl2vl-scope>
>>>>         </sl2vl-tables>
>>>>
>>>>         <!-- define all types of VLArb tables. The length of the tables should 
>>>>              match the physically supported tables by their target ports -->
>>>>         <vlarb-tables>
>>>>             <!-- scope defines the exact ports the VLArb tables apply to -->
>>>>             <vlarb-scope> 
>>>>                 <group>Storage</group>
>>>>                 <!-- VLArb table holds VL and weight pairs -->
>>>>                 <vlarb-high>0:255,1:127,2:63,3:31,4:15,5:7,6:3,7:1</vlarb-high>
>>>>                 <vlarb-low>8:255,9:127,10:63,11:31,12:15,13:7,14:3</vlarb-low>
>>>>                 <vl-high-limit>10</vl-high-limit>
>>> What happens if the shape of VLArb indicated here does not match the
>>> device ?
>> The part that sets up the QoS in SM (I'm not writing this part right now),
> 
> What is the plan for this ?
> 
>> should issue error message in case VLArb definition doesn't match the device
>> properties.
> 
> Aside from the error message, is there any additional error handling for
> this ?
> 
>>>>             </vlarb-scope>
>>>>         </vlarb-tables>
>>>>     </qos-setup>
>>>>
>>>>     <qos-levels>
>>>>         <!-- the first one is just setting SL -->
>>>>         <qos-level> 
>>>>             <sn>1</sn> 
>>> What does sn mean ? What is it used for ?
>> 'sn' is an id of this qos level definition.
>> It is referenced later in by QoS match rules as 'qos-level-sn'
> 
> What is 'sn' short for ?
> 
>>>>             <use>for the lowest priority comm</use>
>>>>             <sl>16</sl>
>>>>         </qos-level>
>>>>         <!-- the second sets SL and TClass -->
>>>>         <qos-level> 
>>>>             <sn>2</sn> 
>>>>             <use>low latency best bandwidth</use>
>>>>             <sl>0</sl> 
>>>>             <class>7</class>
>>> What is class ? I saw TClass mentioned earlier. Is this TClass or
>>> something else ?
>> Instead of "TClass" there should be "QoS Class".
>> The <class> value is the PathRecord.qos_class value that should be
>> returned in the path record query response when a certain <qos-level>
>> is applied to the returned path.
> 
> So these names need to change to be more consistent ?
> 
>>>>         </qos-level>
>>>>         <!-- the whole set: SL, TClass, MTU-Limit, Rate-Limit, Path-Bits  -->
>>> If specified, do MTU limit and rate limit add extra limits to be imposed
>>> on what is selected (and realizable) ?
>> Yes
>>
>>> Strictly speaking, couldn't packet lifetime limit also be added to this
>>> syntax here ? I presume it was left out as being not "interesting" as
>>> yet. Is that correct ?
>> I can add packet lifetime limit - it's not a big deal
>>
>>> Also, how are path bits used ?
>> For now I don't do anything with them - we'll discuss this issue in the future.
> 
> How are they envisioned to be used ? 
> 
> Why are they in the syntax now ? Seems inconsistent with PLL.
> 
> Should there be a warning if they are specified now since they are not
> used ? 
>  
>>>>         <qos-level> 
>>>>             <sn>3</sn> 
>>>>             <use>just an example</use>
>>>>             <sl>0</sl> 
>>>>             <class>32</class> 
>>>>             <mtu-limit>1</mtu-limit> 
>>>>             <rate-limit>1</rate-limit>
>>>>         </qos-level>
>>>>     </qos-levels>
>>>>
>>>>     <!-- Match rules are scanned in a first-fit manner (like firewall rules table) -->
>>>>     <qos-match-rules>
>>>>         <!-- matching by single criteria: class (list of values and ranges) -->
>>>>         <qos-match-rule> 
>>>>             <qos-level-sn>1</qos-level-sn> <!-- defined in <sn> of <qos-level> -->
>>>>             <use>low latency by class 7-9 or 11</use> <!-- just a description -->
>>>>             <class>7-9,11</class> <!--  -->
>>>>             <match-level>1</match-level> <!-- ID of this match rule -->
>>>>         </qos-match-rule>
>>>>         <!-- show matching by destination group AND service-ids -->
>>>>         <qos-match-rule> 
>>>>             <qos-level-sn>2</qos-level-sn> 
>>>>             <use>Storage targets connection></use>
>>>>             <destination>Storage</destination>
>>>>             <service>22,4719</service>
>>> What is service ? What does 22.4719 mean ?
>> The syntax is <service>service_id1,service_id1,...</service>, so in the
>> example above these are actually two service ids.
> 
> So you can create arbitrary lists of service IDs. What about ranges ?
> Does the syntax support that ?
> 
>> As for the exact meaning of this, I'm not sure - I need to think about it...
> 
> Let me know. I'd really like to understand the syntax.
> 
>>>>             <match-level>3</match-level> 
>>> What are match-levels used for ?
>> Actually, they are not used - they shouldn't appear here.
>> Somehow it was copy-pasted here from one of the older versions
>> of the policy file.
> 
> So can this be updated for what is current ?
> 
> Thanks.
> 
> -- Hal
> 
>> -- Yevgeny
>>
>>> -- Hal
>>>
>>>>         </qos-match-rule>
>>>>     </qos-match-rules>
>>>>
>>>> </qos-policy>
>>>>
>>>>
>>>>
>>>> -- Yevgeny
>>>>
>>>> Yevgeny Kliteynik wrote:
>>>>> Hi Sasha,
>>>>>
>>>>> Sasha Khapyorsky wrote:
>>>>>> On 10:46 Sun 21 Jan     , Yevgeny Kliteynik wrote:
>>>>>>> Hi Sasha.
>>>>>>>
>>>>>>> Sasha Khapyorsky wrote:
>>>>>>>> Hi Yevgeny,
>>>>>>>>
>>>>>>>> On 17:01 Wed 17 Jan     , Yevgeny Kliteynik wrote:
>>>>>>>>> Hi Hal
>>>>>>>>>
>>>>>>>>> The following series of six patches implements QoS policy file parser:
>>>>>>>>>
>>>>>>>>> 1. QoS parser Lex file
>>>>>>>>> 2. QoS parser Lex-generated c file
>>>>>>>>> 3. QoS parser grammar (Yacc) file
>>>>>>>>> 4. QoS parser Yacc-generated grammar c and h file
>>>>>>>>> 5. QoS parser header file that defines parse tree data structures 
>>>>>>>>> 6. Changes in makefiles and configure.in file for compiling QoS parser files
>>>>>>>> Is there any description of proposed format and functionality?
>>>>>>> The parser is based on QoS RFC sent by Eitan in May 2006, with a few
>>>>>>> minor modifications. You can find the RFC here:
>>>>>>> http://openib.org/pipermail/openib-general/2006-May/022336.html
>>>>>> This was RFC and couple of issues were discussed then. Now you are about
>>>>>> implementation phase and exact format description would be desired. For
>>>>>> example what "few minor modifications" are?
>>>>> I'll prepare an example file with explanations.
>>>>>
>>>>> -- Yevgeny
>>>>>
>>>>>>>> Also what about using human readable formats?
>>>>>>> To me the xml-like format in the RFC looks pretty readable.
>>>>>>> It has very limited number of keywords (tags), so it's easy 
>>>>>>> to follow and/or to modify.
>>>>>> It is your opinion, not everybody will agree with it (AFAIR this was
>>>>>> discussed too during RFC).
>>>>>>
>>>>>> I would not be care, but I don't know any example of really successful
>>>>>> XML using for configuration purposes (especially where advanced graphical
>>>>>> config editors/viewers were not used). Do you know?
>>>>>>
>>>>>> Sasha
>>>>>>
>>>>> _______________________________________________
>>>>> openib-general mailing list
>>>>> openib-general at openib.org
>>>>> http://openib.org/mailman/listinfo/openib-general
>>>>>
>>>>> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>>>>>
> 




More information about the general mailing list