[ofa-general] Re: [PATCH 3/7 V2] osm: QoS policy C & H files

Hal Rosenstock hal.rosenstock at gmail.com
Tue Aug 28 08:48:14 PDT 2007


Hi Eitan,

On 8/28/07, Eitan Zahavi <eitan at mellanox.co.il> wrote:
> Hi Hal,
>
> As you pointed out, coming up with a VLArb and SL2VL manager that is
> supporting both routing and QoS is not trivial.
> But even with a uniform and statically assigned tables (supported by the
> current options config file) different QoS classes can be provided to
> different clients by the new QoS policy code.
>
> I agree our OFED 1.3 is somewhat limited as it does not solve the VLArb
> and SL2VL setting issue.
> But with the time constrains of the development, review and QA this is
> the most we could do for 1.3

How much of the QoS policy syntax is left "dormant" by this ? How much
is actually used ? Can this be detailed somewhere ?

Also, there was a previous question on the list on IPoIB CM relative
to QoS which I either missed the answer to or was left unanswered:
How is IPoIB CM dealt with in terms of QoS ? Is it like IPoIB UD ?

What documentation will be provided (e.g. man page and perhaps other
doc file for this as it is quite complex) ?

It would be nice to see the complete picture but it seems like this
will be a set of incremental changes for which some looking back over
the entirety will be needed when things settle down.

-- Hal

> EZ
>
> Eitan Zahavi
> Senior Engineering Director, Software Architect
> Mellanox Technologies LTD
> Tel:+972-4-9097208
> Fax:+972-4-9593245
> P.O. Box 586 Yokneam 20692 ISRAEL
>
>
>
> > -----Original Message-----
> > From: general-bounces at lists.openfabrics.org
> > [mailto:general-bounces at lists.openfabrics.org] On Behalf Of
> > Hal Rosenstock
> > Sent: Monday, August 27, 2007 7:33 PM
> > To: Yevgeny Kliteynik
> > Cc: OpenIB
> > Subject: Re: [ofa-general] Re: [PATCH 3/7 V2] osm: QoS policy
> > C & H files
> >
> > On 8/23/07, Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il> wrote:
> > > Hi Sasha,
> > >
> > > Sasha Khapyorsky wrote:
> > > > On 11:11 Thu 23 Aug     , Yevgeny Kliteynik wrote:
> > > >>>> +/***************************************************/
> > > >>>> +
> > > >>>> +typedef struct osm_qos_port_group_t_ {
> > > >>>> +  char *name;             /* single string (this port
> > group name) */
> > > >>>> +  char *use;              /* single string (description) */
> > > >>>> +  cl_list_t port_name_list;       /* list of port
> > names (.../.../...) */
> > > >>>> +  uint64_t **guid_range_arr;      /* array of guid
> > ranges (pair of 64-bit
> > > >>>> guids) */
> > > >>>> +  unsigned guid_range_len;        /* num of guid
> > ranges in the array */
> > > >>>> +  cl_list_t partition_list;       /* list of
> > partition names */
> > > >>>> +  boolean_t node_type_ca;
> > > >>>> +  boolean_t node_type_switch;
> > > >>>> +  boolean_t node_type_router;
> > > >>>> +  boolean_t node_type_self;
> > > >>>> +} osm_qos_port_group_t;
> > > >>> I see you are using this in "run-time", not just during
> > the parsing.
> > > >>> Instead of having all this config features you can just resolve
> > > >>> port guids in parse time and keep it here in cl_map()
> > for fast searches.
> > > >>  By saying "config features", do you mean the four boolean flags?
> > > >
> > > > No, I mean everything except name and use.
> > > >
> > > >>  It looks to me that checking the type of node is as fast as it
> > > >> gets,  and it won't hurt to leave these booleans instead of
> > > >> resolving  all the guids.
> > > >
> > > > It could be optimization when types are specified, which is not
> > > > always the case and then you are going to do linear
> > searches over all lists.
> > >
> > > Right, it would be linear scanning of list of partitions.
> > >
> > > > And how something like "for each guid in this group" (which is
> > > > needed for QoS port parameters setup) should be resolved? By
> > > > matching each guid in the subnet against those lists?
> > >
> > > Good point.
> > >
> > > >>  Moreover, the guids here are stored in range array,
> > which is IMO
> > > >> better suited for the policy file syntax, because if a user
> > > >> specifies  something like this "0x0-0x0FFF" in guids, it will be
> > > >> only one element  of the array, which is efficient both
> > in memory and in serch time.
> > > >
> > > > And what should be there if user specifies ports in the group as:
> > > > guid1, guid2, guid3, etc. ?
> > >
> > > The range array is sorted and "shrinked".
> > > That is, if a user specifies guids as "30,1,2,3-20,15",
> > eventually you
> > > would get two elements in the array: 1-20 and 30-30.
> > > cl_map implemented as a binary tree, right?
> > > And doing binary search in this kind of array is faster
> > than searching
> > > a guid in cl_map of all the guids.
> > > Worst case - you will get the same performance as in case
> > of cl_map if
> > > *all* the guids are "discreet" and can't be groupped in ranges.
> > >
> > > Nevertheless, I agree that there's a problem if there are many
> > > partitions in the list (although I'm not sure it's a practical case)
> > >
> > > I'll work on this.
> > >
> > > >>  (I probably should mention here that the efficient
> > search in the range
> > > >>   array is not implemented yet, but it would be a simple
> > binary search -
> > > >>   there's a "todo" comment in the search function right now)
> > > >>
> > > >>>> +
> > > >>>> +osm_qos_port_group_t *osm_qos_policy_port_group_create();
> > > >>>> +void osm_qos_policy_port_group_destroy();
> > > >>> Would be nice to have function prototypes in one place.
> > > >>>> +
> > > >>>> +/***************************************************/
> > > >>>> +
> > > >>>> +typedef struct osm_qos_vlarb_scope_t_ {
> > > >>>> +  cl_list_t group_list;   /* list of group names (strings) */
> > > >>>> +  cl_list_t across_list;  /* list of 'across' group
> > names (strings) */
> > > >>>> +  cl_list_t vlarb_high_list;      /* list of num
> > pairs (n:m,...), 32-bit values
> > > >>>> */
> > > >>>> +  cl_list_t vlarb_low_list;       /* list of num
> > pairs (n:m,...), 32-bit values
> > > >>>> */
> > > >>> Why cl_list for VLArb? it should be short fixed length arrays?
> > > >>  Right.
> > > >>  Since the actual VLArb setup is not implemented yet, I
> > didn't see
> > > >> this obvious thing.
> > > >
> > > > But it should be implemented. Right?
> > >
> > > Sure, but not for OFED 1.3 - we have a feature freeze in 11 days.
> >
> > Without VLArb configuration, where does this come from ? What
> > is the QoS support then: just SL ?
> >
> > -- Hal
> >
> > >
> > > >>>> +  uint32_t vl_high_limit; /* single integer */
> > > >>>> +  boolean_t vl_high_limit_set;
> > > >>>> +} osm_qos_vlarb_scope_t;
> > > >>>> +
> > > >>>> +osm_qos_vlarb_scope_t *osm_qos_policy_vlarb_scope_create();
> > > >>>> +void osm_qos_policy_vlarb_scope_destroy();
> > > >>>> +
> > > >>>> +/***************************************************/
> > > >>>> +
> > > >>>> +typedef struct osm_qos_sl2vl_scope_t_ {
> > > >>>> +  cl_list_t group_list;   /* list of strings (port
> > group names) */
> > > >>>> +  boolean_t from[OSM_QOS_POLICY_MAX_PORTS_ON_SWITCH];
> > > >>>> +  boolean_t to[OSM_QOS_POLICY_MAX_PORTS_ON_SWITCH];
> > > >>>> +  cl_list_t across_from_list;     /* list of strings
> > (port group names) */
> > > >>>> +  cl_list_t across_to_list;       /* list of strings
> > (port group names) */
> > > >>>> +  uint8_t sl2vl_table[16];        /* array of sl2vl values */
> > > >>>> +  boolean_t sl2vl_table_set;
> > > >>>> +} osm_qos_sl2vl_scope_t;
> > > >>> This will be used for sl2vl setup? Same as above - why not to
> > > >>> generate final port guid list just in parse time?
> > > >>>> +
> > > >>>> +osm_qos_sl2vl_scope_t *osm_qos_policy_sl2vl_scope_create();
> > > >>>> +void osm_qos_policy_sl2vl_scope_destroy();
> > > >>>> +
> > > >>>> +/***************************************************/
> > > >>>> +
> > > >>>> +typedef struct osm_qos_level_t_ {
> > > >>>> +  char *use;
> > > >>>> +  char *name;
> > > >>>> +  uint8_t sl;
> > > >>>> +  boolean_t sl_set;
> > > >>>> +  uint8_t mtu_limit;
> > > >>>> +  boolean_t mtu_limit_set;
> > > >>>> +  uint8_t rate_limit;
> > > >>>> +  boolean_t rate_limit_set;
> > > >>>> +  uint8_t pkt_life;
> > > >>>> +  boolean_t pkt_life_set;
> > > >>>> +  uint64_t **path_bits_range_arr; /* array of bit
> > ranges (real
> > > >>>> +values are
> > > >>>> 32bits) */
> > > >>>> +  unsigned path_bits_range_len;   /* num of bit
> > ranges in the array */
> > > >>>> +  uint64_t **pkey_range_arr;      /* array of PKey
> > ranges (real values are
> > > >>>> 16bits) */
> > > >>>> +  unsigned pkey_range_len;
> > > >>>> +} osm_qos_level_t;
> > > >>>> +
> > > >>>> +osm_qos_level_t *osm_qos_policy_qos_level_create();
> > > >>>> +void osm_qos_policy_qos_level_destroy();
> > > >>>> +
> > > >>>> +boolean_t osm_qos_level_has_pkey(IN const
> > osm_qos_level_t * p_qos_level,
> > > >>>> +                           IN ib_net16_t pkey);
> > > >>>> +
> > > >>>> +ib_net16_t osm_qos_level_get_shared_pkey(IN const
> > > >>>> +osm_qos_level_t *
> > > >>>> p_qos_level,
> > > >>>> +                                   IN const
> > osm_physp_t * p_src_physp,
> > > >>>> +                                   IN const osm_physp_t *
> > > >>>> + p_dest_physp);
> > > >>>> +
> > > >>>> +/***************************************************/
> > > >>>> +
> > > >>>> +typedef struct osm_qos_match_rule_t_ {
> > > >>>> +  char *use;
> > > >>>> +  cl_list_t source_list;  /* list of strings */
> > > >>>> +  cl_list_t source_group_list;    /* list of pointers
> > to relevant port-group
> > > >>>> */
> > > >>>> +  cl_list_t destination_list;     /* list of strings */
> > > >>>> +  cl_list_t destination_group_list;       /* list of
> > pointers to relevant
> > > >>>> port-group */
> > > >>> I think you should only keep port guids there (mapped
> > for fast searches).
> > > >>  Same as above.
> > > >>  I think that checking node type and then guid range array is
> > > >> essentially faster than checking guid map.
> > > >
> > > > _Only_ for case when the group is specified by type,
> > which is likely
> > > > will not be typical.
> > >
> > > Again, in *worst* case you will get the same performance in
> > searching
> > > range array as in searching cl_map (which is a binary tree).
> > >
> > > >>  You might say, of course, that there can be many port
> > groups  in
> > > >> the same match rule, but I don't see this as a practical
> >  example.
> > > >
> > > > Of course it could (or you must disable multi groups
> > here, which is
> > > > not good idea I think and not what was presented in the RFC).
> > >
> > > Can you elaborate on this?
> > > Do you think that having multiple groups is not useful at all?
> > >
> > > > Also each group still have this "linear searchable" lists:
> > > >
> > > >       cl_list_t port_name_list;       /* list of port
> > names (.../.../...) */
> > >
> > > Port names are not implemented yet :)
> > > There's a "todo" comment in the code.
> > > This is the only keyword in the parser that is not implemented - I
> > > mean it's parsed, but other than creating this list the
> > parser doesn't do anything with it.
> > > But It definitely won't stay as a pure list.
> > >
> > > >       uint64_t **guid_range_arr;      /* array of guid
> > ranges (pair of 64-bit
> > >
> > > This is not just a list, as I've explained before.
> > >
> > > >       cl_list_t partition_list;       /* list of
> > partition names */
> > >
> > > I can hardly believe than there will be more than one or two
> > > partitions in this list. Do you think otherwise?
> > >
> > > Anyway, I can extract all the guids and prepare a map. This is
> > > certainly the easiest implementation. And I still think that on
> > > average, guid ranges and partition lists are better, but
> > who knows -
> > > perhaps someone would want to define a bunch of partitions
> > in a single
> > > port group and ruin my average... :)
> > >
> > > >>>> +  char *qos_level_name;
> > > >>>> +  osm_qos_level_t *p_qos_level;
> > > >>> Why do you need qos_level_name if you keep the pointer to this
> > > >>> qos_level struct?
> > > >>  In policy file the match rule might appear before the
> > QoS levels,
> > > >> so matching qos level names to the actual qos levels is
> > done when
> > > >> the parsing is done.
> > > >>
> > > >>>> +  uint64_t **service_id_range_arr;        /* array of
> > SID ranges (64-bit values)
> > > >>>> */
> > > >>>> +  unsigned service_id_range_len;  uint64_t
> > > >>>> + **qos_class_range_arr; /* array of QoS Class ranges (real
> > > >>>> values are 16bits) */
> > > >>>> +  unsigned qos_class_range_len;
> > > >>>> +  uint64_t **pkey_range_arr;      /* array of PKey
> > ranges (real values are
> > > >>>> 16bits) */
> > > >>>> +  unsigned pkey_range_len;
> > > >>>> +} osm_qos_match_rule_t;
> > > >>>> +
> > > >>>> +osm_qos_match_rule_t *osm_qos_policy_match_rule_create();
> > > >>>> +void osm_qos_policy_match_rule_destroy();
> > > >>>> +
> > > >>>> +/***************************************************/
> > > >>>> +
> > > >>>> +typedef struct osm_qos_policy_t_ {
> > > >>>> +  cl_list_t port_groups;  /* list of osm_qos_port_group_t */
> > > >>>> +  cl_list_t sl2vl_tables; /* list of osm_qos_sl2vl_scope_t */
> > > >>>> +  cl_list_t vlarb_tables; /* list of osm_qos_vlarb_scope_t */
> > > >>>> +  cl_list_t qos_levels;   /* list of osm_qos_level_t */
> > > >>>> +  cl_list_t qos_match_rules;      /* list of
> > osm_qos_match_rule_t */
> > > >>>> +  osm_qos_level_t *p_default_qos_level;   /* default
> > QoS level */
> > > >>>> +} osm_qos_policy_t;
> > > >>>> +
> > > >>>> +void osm_qos_policy_create();
> > > >>>> +void osm_qos_policy_destroy();
> > > >>>> +int osm_qos_policy_validate();
> > > >>>> +
> > > >>>> +void osm_qos_policy_get_qos_level_by_pr(IN const
> > osm_pr_rcv_t * p_rcv,
> > > >>>> +                                  IN const
> > ib_path_rec_t * p_pr,
> > > >>>> +                                  IN const
> > osm_physp_t * p_src_physp,
> > > >>>> +                                  IN const
> > osm_physp_t * p_dest_physp,
> > > >>>> +                                  IN ib_net64_t comp_mask,
> > > >>>> +                                  OUT osm_qos_level_t **
> > > >>>> +pp_qos_level);
> > > >>>> +
> > > >>>> +/***************************************************/
> > > >>>> +
> > > >>>> +int osm_qos_parse_policy_file(IN osm_log_t * p_log, IN const
> > > >>>> +char
> > > >>>> *policy_file);
> > > >>>> +
> > > >>>> +/***************************************************/
> > > >>>> +
> > > >>>> +#endif                            /* ifndef
> > OSM_QOS_POLICY_H */
> > > >>>> diff --git a/opensm/opensm/osm_qos_policy.c
> > > >>>> b/opensm/opensm/osm_qos_policy.c new file mode 100644 index
> > > >>>> 0000000..bc2aa68
> > > >>>> --- /dev/null
> > > >>>> +++ b/opensm/opensm/osm_qos_policy.c
> > > >>>> @@ -0,0 +1,901 @@
> > > >>>> +/*
> > > >>>> + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved.
> > > >>>> + * Copyright (c) 2002-2005 Mellanox Technologies LTD.
> > All rights
> > > >>>> reserved.
> > > >>>> + * Copyright (c) 1996-2003 Intel Corporation. All
> > rights reserved.
> > > >>>> + *
> > > >>>> + * This software is available to you under a choice of one of
> > > >>>> + two
> > > >>>> + * licenses.  You may choose to be licensed under the
> > terms of
> > > >>>> + the GNU
> > > >>>> + * General Public License (GPL) Version 2, available from the
> > > >>>> + file
> > > >>>> + * COPYING in the main directory of this source tree, or the
> > > >>>> + * OpenIB.org BSD license below:
> > > >>>> + *
> > > >>>> + *     Redistribution and use in source and binary
> > forms, with or
> > > >>>> + *     without modification, are permitted provided
> > that the following
> > > >>>> + *     conditions are met:
> > > >>>> + *
> > > >>>> + *      - Redistributions of source code must retain the above
> > > >>>> + *        copyright notice, this list of conditions
> > and the following
> > > >>>> + *        disclaimer.
> > > >>>> + *
> > > >>>> + *      - Redistributions in binary form must
> > reproduce the above
> > > >>>> + *        copyright notice, this list of conditions
> > and the following
> > > >>>> + *        disclaimer in the documentation and/or
> > other materials
> > > >>>> + *        provided with the distribution.
> > > >>>> + *
> > > >>>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY
> > > >>>> + KIND,
> > > >>>> + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE
> > > >>>> + WARRANTIES OF
> > > >>>> + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
> > > >>>> + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
> > COPYRIGHT
> > > >>>> + HOLDERS
> > > >>>> + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> > LIABILITY, WHETHER
> > > >>>> + IN AN
> > > >>>> + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> > FROM, OUT OF
> > > >>>> + OR IN
> > > >>>> + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> > DEALINGS IN
> > > >>>> + THE
> > > >>>> + * SOFTWARE.
> > > >>>> + *
> > > >>>> + */
> > > >>>> +
> > > >>>> +/*
> > > >>>> + * Abstract:
> > > >>>> + *    OSM QoS Policy functions.
> > > >>>> + *
> > > >>>> + * Environment:
> > > >>>> + *    Linux User Mode
> > > >>>> + *
> > > >>>> + * Author:
> > > >>>> + *    Yevgeny Kliteynik, Mellanox
> > > >>>> + */
> > > >>>> +
> > > >>>> +#include <opensm/osm_qos_policy.h> #include
> > > >>>> +<opensm/osm_qos_parser_y.h> #include <opensm/osm_partition.h>
> > > >>>> +
> > > >>>> +extern void yyerror(char *s);
> > > >>>> +osm_log_t *p_qos_parser_osm_log = NULL; osm_qos_policy_t
> > > >>>> +*p_qos_policy = NULL;
> > > >>> Please try to avoid globals - keep it as part of
> > osm_opensm_t or
> > > >>> osm_subn_t structures.
> > > >>  I thought about it, but didn't want to "condaminate" the
> > > >> osm_opensm_t  or osm_subn_t structures untill the QoS
> > functionality is ready.
> > > >
> > > > How globals are better in this sense?
> > >
> > > They're not :)
> > > The difference is that less files are modified.
> > > But I agree that the QoS policy should be part of
> > osm_opensm_t or osm_subn_t.
> > >
> > > -- Yevgeny
> > >
> > > > Sasha
> > > >
> > >
> > > _______________________________________________
> > > general mailing list
> > > general at lists.openfabrics.org
> > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> > >
> > > To unsubscribe, please visit
> > > http://openib.org/mailman/listinfo/openib-general
> > >
> > _______________________________________________
> > general mailing list
> > general at lists.openfabrics.org
> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> >
> > To unsubscribe, please visit
> > http://openib.org/mailman/listinfo/openib-general
> >
>



More information about the general mailing list