[ofa-general] Re: [PATCH 3/7 V2] osm: QoS policy C & H files

Hal Rosenstock hal.rosenstock at gmail.com
Tue Aug 28 11:17:48 PDT 2007


Hi Eitan,

On 8/28/07, Eitan Zahavi <eitan at mellanox.co.il> wrote:
> Hi Hal,
> > >
> > > As you pointed out, coming up with a VLArb and SL2VL
> > manager that is
> > > supporting both routing and QoS is not trivial.
> > > But even with a uniform and statically assigned tables
> > (supported by
> > > the current options config file) different QoS classes can
> > be provided
> > > to different clients by the new QoS policy code.
> > >
> > > I agree our OFED 1.3 is somewhat limited as it does not solve the
> > > VLArb and SL2VL setting issue.
> > > But with the time constrains of the development, review and
> > QA this is
> > > the most we could do for 1.3
> >
> > How much of the QoS policy syntax is left "dormant" by this ?
> > How much is actually used ? Can this be detailed somewhere ?
> Simply ignore the "Setup" section.

> > Also, there was a previous question on the list on IPoIB CM
> > relative to QoS which I either missed the answer to or was
> > left unanswered:
> > How is IPoIB CM dealt with in terms of QoS ? Is it like IPoIB UD ?
> IPoIB CM uses a specific ServiceID.

Isn't it a range of Service IDs ?

> It uses CMA so it does not need any change...

Doesn't it use CM rather than CMA ?

-- Hal

> >
> > What documentation will be provided (e.g. man page and
> > perhaps other doc file for this as it is quite complex) ?
> >
> > It would be nice to see the complete picture but it seems
> > like this will be a set of incremental changes for which some
> > looking back over the entirety will be needed when things settle down.
> >
> > -- Hal
> >
> > > EZ
> > >
> > > Eitan Zahavi
> > > Senior Engineering Director, Software Architect Mellanox
> > Technologies
> > > LTD
> > > Tel:+972-4-9097208
> > > Fax:+972-4-9593245
> > > P.O. Box 586 Yokneam 20692 ISRAEL
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: general-bounces at lists.openfabrics.org
> > > > [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Hal
> > > > Rosenstock
> > > > Sent: Monday, August 27, 2007 7:33 PM
> > > > To: Yevgeny Kliteynik
> > > > Cc: OpenIB
> > > > Subject: Re: [ofa-general] Re: [PATCH 3/7 V2] osm: QoS
> > policy C & H
> > > > files
> > > >
> > > > On 8/23/07, Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il> wrote:
> > > > > Hi Sasha,
> > > > >
> > > > > Sasha Khapyorsky wrote:
> > > > > > On 11:11 Thu 23 Aug     , Yevgeny Kliteynik wrote:
> > > > > >>>> +/***************************************************/
> > > > > >>>> +
> > > > > >>>> +typedef struct osm_qos_port_group_t_ {
> > > > > >>>> +  char *name;             /* single string (this port
> > > > group name) */
> > > > > >>>> +  char *use;              /* single string
> > (description) */
> > > > > >>>> +  cl_list_t port_name_list;       /* list of port
> > > > names (.../.../...) */
> > > > > >>>> +  uint64_t **guid_range_arr;      /* array of guid
> > > > ranges (pair of 64-bit
> > > > > >>>> guids) */
> > > > > >>>> +  unsigned guid_range_len;        /* num of guid
> > > > ranges in the array */
> > > > > >>>> +  cl_list_t partition_list;       /* list of
> > > > partition names */
> > > > > >>>> +  boolean_t node_type_ca;
> > > > > >>>> +  boolean_t node_type_switch;
> > > > > >>>> +  boolean_t node_type_router;
> > > > > >>>> +  boolean_t node_type_self;
> > > > > >>>> +} osm_qos_port_group_t;
> > > > > >>> I see you are using this in "run-time", not just during
> > > > the parsing.
> > > > > >>> Instead of having all this config features you can just
> > > > > >>> resolve port guids in parse time and keep it here
> > in cl_map()
> > > > for fast searches.
> > > > > >>  By saying "config features", do you mean the four
> > boolean flags?
> > > > > >
> > > > > > No, I mean everything except name and use.
> > > > > >
> > > > > >>  It looks to me that checking the type of node is as
> > fast as it
> > > > > >> gets,  and it won't hurt to leave these booleans instead of
> > > > > >> resolving  all the guids.
> > > > > >
> > > > > > It could be optimization when types are specified,
> > which is not
> > > > > > always the case and then you are going to do linear
> > > > searches over all lists.
> > > > >
> > > > > Right, it would be linear scanning of list of partitions.
> > > > >
> > > > > > And how something like "for each guid in this group"
> > (which is
> > > > > > needed for QoS port parameters setup) should be resolved? By
> > > > > > matching each guid in the subnet against those lists?
> > > > >
> > > > > Good point.
> > > > >
> > > > > >>  Moreover, the guids here are stored in range array,
> > > > which is IMO
> > > > > >> better suited for the policy file syntax, because if a user
> > > > > >> specifies  something like this "0x0-0x0FFF" in
> > guids, it will
> > > > > >> be only one element  of the array, which is efficient both
> > > > in memory and in serch time.
> > > > > >
> > > > > > And what should be there if user specifies ports in
> > the group as:
> > > > > > guid1, guid2, guid3, etc. ?
> > > > >
> > > > > The range array is sorted and "shrinked".
> > > > > That is, if a user specifies guids as "30,1,2,3-20,15",
> > > > eventually you
> > > > > would get two elements in the array: 1-20 and 30-30.
> > > > > cl_map implemented as a binary tree, right?
> > > > > And doing binary search in this kind of array is faster
> > > > than searching
> > > > > a guid in cl_map of all the guids.
> > > > > Worst case - you will get the same performance as in case
> > > > of cl_map if
> > > > > *all* the guids are "discreet" and can't be groupped in ranges.
> > > > >
> > > > > Nevertheless, I agree that there's a problem if there are many
> > > > > partitions in the list (although I'm not sure it's a practical
> > > > > case)
> > > > >
> > > > > I'll work on this.
> > > > >
> > > > > >>  (I probably should mention here that the efficient
> > > > search in the range
> > > > > >>   array is not implemented yet, but it would be a simple
> > > > binary search -
> > > > > >>   there's a "todo" comment in the search function right now)
> > > > > >>
> > > > > >>>> +
> > > > > >>>> +osm_qos_port_group_t *osm_qos_policy_port_group_create();
> > > > > >>>> +void osm_qos_policy_port_group_destroy();
> > > > > >>> Would be nice to have function prototypes in one place.
> > > > > >>>> +
> > > > > >>>> +/***************************************************/
> > > > > >>>> +
> > > > > >>>> +typedef struct osm_qos_vlarb_scope_t_ {
> > > > > >>>> +  cl_list_t group_list;   /* list of group names
> > (strings) */
> > > > > >>>> +  cl_list_t across_list;  /* list of 'across' group
> > > > names (strings) */
> > > > > >>>> +  cl_list_t vlarb_high_list;      /* list of num
> > > > pairs (n:m,...), 32-bit values
> > > > > >>>> */
> > > > > >>>> +  cl_list_t vlarb_low_list;       /* list of num
> > > > pairs (n:m,...), 32-bit values
> > > > > >>>> */
> > > > > >>> Why cl_list for VLArb? it should be short fixed
> > length arrays?
> > > > > >>  Right.
> > > > > >>  Since the actual VLArb setup is not implemented yet, I
> > > > didn't see
> > > > > >> this obvious thing.
> > > > > >
> > > > > > But it should be implemented. Right?
> > > > >
> > > > > Sure, but not for OFED 1.3 - we have a feature freeze
> > in 11 days.
> > > >
> > > > Without VLArb configuration, where does this come from ?
> > What is the
> > > > QoS support then: just SL ?
> > > >
> > > > -- Hal
> > > >
> > > > >
> > > > > >>>> +  uint32_t vl_high_limit; /* single integer */
> > > > > >>>> +  boolean_t vl_high_limit_set; } osm_qos_vlarb_scope_t;
> > > > > >>>> +
> > > > > >>>> +osm_qos_vlarb_scope_t
> > *osm_qos_policy_vlarb_scope_create();
> > > > > >>>> +void osm_qos_policy_vlarb_scope_destroy();
> > > > > >>>> +
> > > > > >>>> +/***************************************************/
> > > > > >>>> +
> > > > > >>>> +typedef struct osm_qos_sl2vl_scope_t_ {
> > > > > >>>> +  cl_list_t group_list;   /* list of strings (port
> > > > group names) */
> > > > > >>>> +  boolean_t from[OSM_QOS_POLICY_MAX_PORTS_ON_SWITCH];
> > > > > >>>> +  boolean_t to[OSM_QOS_POLICY_MAX_PORTS_ON_SWITCH];
> > > > > >>>> +  cl_list_t across_from_list;     /* list of strings
> > > > (port group names) */
> > > > > >>>> +  cl_list_t across_to_list;       /* list of strings
> > > > (port group names) */
> > > > > >>>> +  uint8_t sl2vl_table[16];        /* array of
> > sl2vl values */
> > > > > >>>> +  boolean_t sl2vl_table_set; } osm_qos_sl2vl_scope_t;
> > > > > >>> This will be used for sl2vl setup? Same as above -
> > why not to
> > > > > >>> generate final port guid list just in parse time?
> > > > > >>>> +
> > > > > >>>> +osm_qos_sl2vl_scope_t
> > *osm_qos_policy_sl2vl_scope_create();
> > > > > >>>> +void osm_qos_policy_sl2vl_scope_destroy();
> > > > > >>>> +
> > > > > >>>> +/***************************************************/
> > > > > >>>> +
> > > > > >>>> +typedef struct osm_qos_level_t_ {
> > > > > >>>> +  char *use;
> > > > > >>>> +  char *name;
> > > > > >>>> +  uint8_t sl;
> > > > > >>>> +  boolean_t sl_set;
> > > > > >>>> +  uint8_t mtu_limit;
> > > > > >>>> +  boolean_t mtu_limit_set;
> > > > > >>>> +  uint8_t rate_limit;
> > > > > >>>> +  boolean_t rate_limit_set;
> > > > > >>>> +  uint8_t pkt_life;
> > > > > >>>> +  boolean_t pkt_life_set;
> > > > > >>>> +  uint64_t **path_bits_range_arr; /* array of bit
> > > > ranges (real
> > > > > >>>> +values are
> > > > > >>>> 32bits) */
> > > > > >>>> +  unsigned path_bits_range_len;   /* num of bit
> > > > ranges in the array */
> > > > > >>>> +  uint64_t **pkey_range_arr;      /* array of PKey
> > > > ranges (real values are
> > > > > >>>> 16bits) */
> > > > > >>>> +  unsigned pkey_range_len;
> > > > > >>>> +} osm_qos_level_t;
> > > > > >>>> +
> > > > > >>>> +osm_qos_level_t *osm_qos_policy_qos_level_create();
> > > > > >>>> +void osm_qos_policy_qos_level_destroy();
> > > > > >>>> +
> > > > > >>>> +boolean_t osm_qos_level_has_pkey(IN const
> > > > osm_qos_level_t * p_qos_level,
> > > > > >>>> +                           IN ib_net16_t pkey);
> > > > > >>>> +
> > > > > >>>> +ib_net16_t osm_qos_level_get_shared_pkey(IN const
> > > > > >>>> +osm_qos_level_t *
> > > > > >>>> p_qos_level,
> > > > > >>>> +                                   IN const
> > > > osm_physp_t * p_src_physp,
> > > > > >>>> +                                   IN const osm_physp_t *
> > > > > >>>> + p_dest_physp);
> > > > > >>>> +
> > > > > >>>> +/***************************************************/
> > > > > >>>> +
> > > > > >>>> +typedef struct osm_qos_match_rule_t_ {
> > > > > >>>> +  char *use;
> > > > > >>>> +  cl_list_t source_list;  /* list of strings */
> > > > > >>>> +  cl_list_t source_group_list;    /* list of pointers
> > > > to relevant port-group
> > > > > >>>> */
> > > > > >>>> +  cl_list_t destination_list;     /* list of strings */
> > > > > >>>> +  cl_list_t destination_group_list;       /* list of
> > > > pointers to relevant
> > > > > >>>> port-group */
> > > > > >>> I think you should only keep port guids there (mapped
> > > > for fast searches).
> > > > > >>  Same as above.
> > > > > >>  I think that checking node type and then guid range
> > array is
> > > > > >> essentially faster than checking guid map.
> > > > > >
> > > > > > _Only_ for case when the group is specified by type,
> > > > which is likely
> > > > > > will not be typical.
> > > > >
> > > > > Again, in *worst* case you will get the same performance in
> > > > searching
> > > > > range array as in searching cl_map (which is a binary tree).
> > > > >
> > > > > >>  You might say, of course, that there can be many port
> > > > groups  in
> > > > > >> the same match rule, but I don't see this as a practical
> > > >  example.
> > > > > >
> > > > > > Of course it could (or you must disable multi groups
> > > > here, which is
> > > > > > not good idea I think and not what was presented in the RFC).
> > > > >
> > > > > Can you elaborate on this?
> > > > > Do you think that having multiple groups is not useful at all?
> > > > >
> > > > > > Also each group still have this "linear searchable" lists:
> > > > > >
> > > > > >       cl_list_t port_name_list;       /* list of port
> > > > names (.../.../...) */
> > > > >
> > > > > Port names are not implemented yet :) There's a "todo"
> > comment in
> > > > > the code.
> > > > > This is the only keyword in the parser that is not
> > implemented - I
> > > > > mean it's parsed, but other than creating this list the
> > > > parser doesn't do anything with it.
> > > > > But It definitely won't stay as a pure list.
> > > > >
> > > > > >       uint64_t **guid_range_arr;      /* array of guid
> > > > ranges (pair of 64-bit
> > > > >
> > > > > This is not just a list, as I've explained before.
> > > > >
> > > > > >       cl_list_t partition_list;       /* list of
> > > > partition names */
> > > > >
> > > > > I can hardly believe than there will be more than one or two
> > > > > partitions in this list. Do you think otherwise?
> > > > >
> > > > > Anyway, I can extract all the guids and prepare a map. This is
> > > > > certainly the easiest implementation. And I still think that on
> > > > > average, guid ranges and partition lists are better, but
> > > > who knows -
> > > > > perhaps someone would want to define a bunch of partitions
> > > > in a single
> > > > > port group and ruin my average... :)
> > > > >
> > > > > >>>> +  char *qos_level_name;
> > > > > >>>> +  osm_qos_level_t *p_qos_level;
> > > > > >>> Why do you need qos_level_name if you keep the
> > pointer to this
> > > > > >>> qos_level struct?
> > > > > >>  In policy file the match rule might appear before the
> > > > QoS levels,
> > > > > >> so matching qos level names to the actual qos levels is
> > > > done when
> > > > > >> the parsing is done.
> > > > > >>
> > > > > >>>> +  uint64_t **service_id_range_arr;        /* array of
> > > > SID ranges (64-bit values)
> > > > > >>>> */
> > > > > >>>> +  unsigned service_id_range_len;  uint64_t
> > > > > >>>> + **qos_class_range_arr; /* array of QoS Class ranges (real
> > > > > >>>> values are 16bits) */
> > > > > >>>> +  unsigned qos_class_range_len;
> > > > > >>>> +  uint64_t **pkey_range_arr;      /* array of PKey
> > > > ranges (real values are
> > > > > >>>> 16bits) */
> > > > > >>>> +  unsigned pkey_range_len;
> > > > > >>>> +} osm_qos_match_rule_t;
> > > > > >>>> +
> > > > > >>>> +osm_qos_match_rule_t *osm_qos_policy_match_rule_create();
> > > > > >>>> +void osm_qos_policy_match_rule_destroy();
> > > > > >>>> +
> > > > > >>>> +/***************************************************/
> > > > > >>>> +
> > > > > >>>> +typedef struct osm_qos_policy_t_ {
> > > > > >>>> +  cl_list_t port_groups;  /* list of
> > osm_qos_port_group_t */
> > > > > >>>> +  cl_list_t sl2vl_tables; /* list of
> > osm_qos_sl2vl_scope_t
> > > > > >>>> +*/
> > > > > >>>> +  cl_list_t vlarb_tables; /* list of
> > osm_qos_vlarb_scope_t */
> > > > > >>>> +  cl_list_t qos_levels;   /* list of osm_qos_level_t */
> > > > > >>>> +  cl_list_t qos_match_rules;      /* list of
> > > > osm_qos_match_rule_t */
> > > > > >>>> +  osm_qos_level_t *p_default_qos_level;   /* default
> > > > QoS level */
> > > > > >>>> +} osm_qos_policy_t;
> > > > > >>>> +
> > > > > >>>> +void osm_qos_policy_create(); void
> > osm_qos_policy_destroy();
> > > > > >>>> +int osm_qos_policy_validate();
> > > > > >>>> +
> > > > > >>>> +void osm_qos_policy_get_qos_level_by_pr(IN const
> > > > osm_pr_rcv_t * p_rcv,
> > > > > >>>> +                                  IN const
> > > > ib_path_rec_t * p_pr,
> > > > > >>>> +                                  IN const
> > > > osm_physp_t * p_src_physp,
> > > > > >>>> +                                  IN const
> > > > osm_physp_t * p_dest_physp,
> > > > > >>>> +                                  IN ib_net64_t comp_mask,
> > > > > >>>> +                                  OUT osm_qos_level_t **
> > > > > >>>> +pp_qos_level);
> > > > > >>>> +
> > > > > >>>> +/***************************************************/
> > > > > >>>> +
> > > > > >>>> +int osm_qos_parse_policy_file(IN osm_log_t *
> > p_log, IN const
> > > > > >>>> +char
> > > > > >>>> *policy_file);
> > > > > >>>> +
> > > > > >>>> +/***************************************************/
> > > > > >>>> +
> > > > > >>>> +#endif                            /* ifndef
> > > > OSM_QOS_POLICY_H */
> > > > > >>>> diff --git a/opensm/opensm/osm_qos_policy.c
> > > > > >>>> b/opensm/opensm/osm_qos_policy.c new file mode 100644 index
> > > > > >>>> 0000000..bc2aa68
> > > > > >>>> --- /dev/null
> > > > > >>>> +++ b/opensm/opensm/osm_qos_policy.c
> > > > > >>>> @@ -0,0 +1,901 @@
> > > > > >>>> +/*
> > > > > >>>> + * Copyright (c) 2004-2006 Voltaire, Inc. All
> > rights reserved.
> > > > > >>>> + * Copyright (c) 2002-2005 Mellanox Technologies LTD.
> > > > All rights
> > > > > >>>> reserved.
> > > > > >>>> + * Copyright (c) 1996-2003 Intel Corporation. All
> > > > rights reserved.
> > > > > >>>> + *
> > > > > >>>> + * This software is available to you under a
> > choice of one
> > > > > >>>> + of two
> > > > > >>>> + * licenses.  You may choose to be licensed under the
> > > > terms of
> > > > > >>>> + the GNU
> > > > > >>>> + * General Public License (GPL) Version 2, available from
> > > > > >>>> + the file
> > > > > >>>> + * COPYING in the main directory of this source
> > tree, or the
> > > > > >>>> + * OpenIB.org BSD license below:
> > > > > >>>> + *
> > > > > >>>> + *     Redistribution and use in source and binary
> > > > forms, with or
> > > > > >>>> + *     without modification, are permitted provided
> > > > that the following
> > > > > >>>> + *     conditions are met:
> > > > > >>>> + *
> > > > > >>>> + *      - Redistributions of source code must
> > retain the above
> > > > > >>>> + *        copyright notice, this list of conditions
> > > > and the following
> > > > > >>>> + *        disclaimer.
> > > > > >>>> + *
> > > > > >>>> + *      - Redistributions in binary form must
> > > > reproduce the above
> > > > > >>>> + *        copyright notice, this list of conditions
> > > > and the following
> > > > > >>>> + *        disclaimer in the documentation and/or
> > > > other materials
> > > > > >>>> + *        provided with the distribution.
> > > > > >>>> + *
> > > > > >>>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT
> > WARRANTY OF ANY
> > > > > >>>> + KIND,
> > > > > >>>> + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE
> > > > > >>>> + WARRANTIES OF
> > > > > >>>> + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
> > > > > >>>> + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
> > > > COPYRIGHT
> > > > > >>>> + HOLDERS
> > > > > >>>> + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> > > > LIABILITY, WHETHER
> > > > > >>>> + IN AN
> > > > > >>>> + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> > > > FROM, OUT OF
> > > > > >>>> + OR IN
> > > > > >>>> + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> > > > DEALINGS IN
> > > > > >>>> + THE
> > > > > >>>> + * SOFTWARE.
> > > > > >>>> + *
> > > > > >>>> + */
> > > > > >>>> +
> > > > > >>>> +/*
> > > > > >>>> + * Abstract:
> > > > > >>>> + *    OSM QoS Policy functions.
> > > > > >>>> + *
> > > > > >>>> + * Environment:
> > > > > >>>> + *    Linux User Mode
> > > > > >>>> + *
> > > > > >>>> + * Author:
> > > > > >>>> + *    Yevgeny Kliteynik, Mellanox
> > > > > >>>> + */
> > > > > >>>> +
> > > > > >>>> +#include <opensm/osm_qos_policy.h> #include
> > > > > >>>> +<opensm/osm_qos_parser_y.h> #include
> > > > > >>>> +<opensm/osm_partition.h>
> > > > > >>>> +
> > > > > >>>> +extern void yyerror(char *s); osm_log_t
> > > > > >>>> +*p_qos_parser_osm_log = NULL; osm_qos_policy_t
> > *p_qos_policy
> > > > > >>>> += NULL;
> > > > > >>> Please try to avoid globals - keep it as part of
> > > > osm_opensm_t or
> > > > > >>> osm_subn_t structures.
> > > > > >>  I thought about it, but didn't want to "condaminate" the
> > > > > >> osm_opensm_t  or osm_subn_t structures untill the QoS
> > > > functionality is ready.
> > > > > >
> > > > > > How globals are better in this sense?
> > > > >
> > > > > They're not :)
> > > > > The difference is that less files are modified.
> > > > > But I agree that the QoS policy should be part of
> > > > osm_opensm_t or osm_subn_t.
> > > > >
> > > > > -- Yevgeny
> > > > >
> > > > > > Sasha
> > > > > >
> > > > >
> > > > > _______________________________________________
> > > > > general mailing list
> > > > > general at lists.openfabrics.org
> > > > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> > > > >
> > > > > To unsubscribe, please visit
> > > > > http://openib.org/mailman/listinfo/openib-general
> > > > >
> > > > _______________________________________________
> > > > general mailing list
> > > > general at lists.openfabrics.org
> > > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> > > >
> > > > To unsubscribe, please visit
> > > > http://openib.org/mailman/listinfo/openib-general
> > > >
> > >
> >
>



More information about the general mailing list