[ofa-general] Re: [PATCH 3/7 V2] osm: QoS policy C & H files
Eitan Zahavi
eitan at mellanox.co.il
Tue Aug 28 00:30:57 PDT 2007
Hi Hal,
As you pointed out, coming up with a VLArb and SL2VL manager that is
supporting both routing and QoS is not trivial.
But even with a uniform and statically assigned tables (supported by the
current options config file) different QoS classes can be provided to
different clients by the new QoS policy code.
I agree our OFED 1.3 is somewhat limited as it does not solve the VLArb
and SL2VL setting issue.
But with the time constrains of the development, review and QA this is
the most we could do for 1.3
EZ
Eitan Zahavi
Senior Engineering Director, Software Architect
Mellanox Technologies LTD
Tel:+972-4-9097208
Fax:+972-4-9593245
P.O. Box 586 Yokneam 20692 ISRAEL
> -----Original Message-----
> From: general-bounces at lists.openfabrics.org
> [mailto:general-bounces at lists.openfabrics.org] On Behalf Of
> Hal Rosenstock
> Sent: Monday, August 27, 2007 7:33 PM
> To: Yevgeny Kliteynik
> Cc: OpenIB
> Subject: Re: [ofa-general] Re: [PATCH 3/7 V2] osm: QoS policy
> C & H files
>
> On 8/23/07, Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il> wrote:
> > Hi Sasha,
> >
> > Sasha Khapyorsky wrote:
> > > On 11:11 Thu 23 Aug , Yevgeny Kliteynik wrote:
> > >>>> +/***************************************************/
> > >>>> +
> > >>>> +typedef struct osm_qos_port_group_t_ {
> > >>>> + char *name; /* single string (this port
> group name) */
> > >>>> + char *use; /* single string (description) */
> > >>>> + cl_list_t port_name_list; /* list of port
> names (.../.../...) */
> > >>>> + uint64_t **guid_range_arr; /* array of guid
> ranges (pair of 64-bit
> > >>>> guids) */
> > >>>> + unsigned guid_range_len; /* num of guid
> ranges in the array */
> > >>>> + cl_list_t partition_list; /* list of
> partition names */
> > >>>> + boolean_t node_type_ca;
> > >>>> + boolean_t node_type_switch;
> > >>>> + boolean_t node_type_router;
> > >>>> + boolean_t node_type_self;
> > >>>> +} osm_qos_port_group_t;
> > >>> I see you are using this in "run-time", not just during
> the parsing.
> > >>> Instead of having all this config features you can just resolve
> > >>> port guids in parse time and keep it here in cl_map()
> for fast searches.
> > >> By saying "config features", do you mean the four boolean flags?
> > >
> > > No, I mean everything except name and use.
> > >
> > >> It looks to me that checking the type of node is as fast as it
> > >> gets, and it won't hurt to leave these booleans instead of
> > >> resolving all the guids.
> > >
> > > It could be optimization when types are specified, which is not
> > > always the case and then you are going to do linear
> searches over all lists.
> >
> > Right, it would be linear scanning of list of partitions.
> >
> > > And how something like "for each guid in this group" (which is
> > > needed for QoS port parameters setup) should be resolved? By
> > > matching each guid in the subnet against those lists?
> >
> > Good point.
> >
> > >> Moreover, the guids here are stored in range array,
> which is IMO
> > >> better suited for the policy file syntax, because if a user
> > >> specifies something like this "0x0-0x0FFF" in guids, it will be
> > >> only one element of the array, which is efficient both
> in memory and in serch time.
> > >
> > > And what should be there if user specifies ports in the group as:
> > > guid1, guid2, guid3, etc. ?
> >
> > The range array is sorted and "shrinked".
> > That is, if a user specifies guids as "30,1,2,3-20,15",
> eventually you
> > would get two elements in the array: 1-20 and 30-30.
> > cl_map implemented as a binary tree, right?
> > And doing binary search in this kind of array is faster
> than searching
> > a guid in cl_map of all the guids.
> > Worst case - you will get the same performance as in case
> of cl_map if
> > *all* the guids are "discreet" and can't be groupped in ranges.
> >
> > Nevertheless, I agree that there's a problem if there are many
> > partitions in the list (although I'm not sure it's a practical case)
> >
> > I'll work on this.
> >
> > >> (I probably should mention here that the efficient
> search in the range
> > >> array is not implemented yet, but it would be a simple
> binary search -
> > >> there's a "todo" comment in the search function right now)
> > >>
> > >>>> +
> > >>>> +osm_qos_port_group_t *osm_qos_policy_port_group_create();
> > >>>> +void osm_qos_policy_port_group_destroy();
> > >>> Would be nice to have function prototypes in one place.
> > >>>> +
> > >>>> +/***************************************************/
> > >>>> +
> > >>>> +typedef struct osm_qos_vlarb_scope_t_ {
> > >>>> + cl_list_t group_list; /* list of group names (strings) */
> > >>>> + cl_list_t across_list; /* list of 'across' group
> names (strings) */
> > >>>> + cl_list_t vlarb_high_list; /* list of num
> pairs (n:m,...), 32-bit values
> > >>>> */
> > >>>> + cl_list_t vlarb_low_list; /* list of num
> pairs (n:m,...), 32-bit values
> > >>>> */
> > >>> Why cl_list for VLArb? it should be short fixed length arrays?
> > >> Right.
> > >> Since the actual VLArb setup is not implemented yet, I
> didn't see
> > >> this obvious thing.
> > >
> > > But it should be implemented. Right?
> >
> > Sure, but not for OFED 1.3 - we have a feature freeze in 11 days.
>
> Without VLArb configuration, where does this come from ? What
> is the QoS support then: just SL ?
>
> -- Hal
>
> >
> > >>>> + uint32_t vl_high_limit; /* single integer */
> > >>>> + boolean_t vl_high_limit_set;
> > >>>> +} osm_qos_vlarb_scope_t;
> > >>>> +
> > >>>> +osm_qos_vlarb_scope_t *osm_qos_policy_vlarb_scope_create();
> > >>>> +void osm_qos_policy_vlarb_scope_destroy();
> > >>>> +
> > >>>> +/***************************************************/
> > >>>> +
> > >>>> +typedef struct osm_qos_sl2vl_scope_t_ {
> > >>>> + cl_list_t group_list; /* list of strings (port
> group names) */
> > >>>> + boolean_t from[OSM_QOS_POLICY_MAX_PORTS_ON_SWITCH];
> > >>>> + boolean_t to[OSM_QOS_POLICY_MAX_PORTS_ON_SWITCH];
> > >>>> + cl_list_t across_from_list; /* list of strings
> (port group names) */
> > >>>> + cl_list_t across_to_list; /* list of strings
> (port group names) */
> > >>>> + uint8_t sl2vl_table[16]; /* array of sl2vl values */
> > >>>> + boolean_t sl2vl_table_set;
> > >>>> +} osm_qos_sl2vl_scope_t;
> > >>> This will be used for sl2vl setup? Same as above - why not to
> > >>> generate final port guid list just in parse time?
> > >>>> +
> > >>>> +osm_qos_sl2vl_scope_t *osm_qos_policy_sl2vl_scope_create();
> > >>>> +void osm_qos_policy_sl2vl_scope_destroy();
> > >>>> +
> > >>>> +/***************************************************/
> > >>>> +
> > >>>> +typedef struct osm_qos_level_t_ {
> > >>>> + char *use;
> > >>>> + char *name;
> > >>>> + uint8_t sl;
> > >>>> + boolean_t sl_set;
> > >>>> + uint8_t mtu_limit;
> > >>>> + boolean_t mtu_limit_set;
> > >>>> + uint8_t rate_limit;
> > >>>> + boolean_t rate_limit_set;
> > >>>> + uint8_t pkt_life;
> > >>>> + boolean_t pkt_life_set;
> > >>>> + uint64_t **path_bits_range_arr; /* array of bit
> ranges (real
> > >>>> +values are
> > >>>> 32bits) */
> > >>>> + unsigned path_bits_range_len; /* num of bit
> ranges in the array */
> > >>>> + uint64_t **pkey_range_arr; /* array of PKey
> ranges (real values are
> > >>>> 16bits) */
> > >>>> + unsigned pkey_range_len;
> > >>>> +} osm_qos_level_t;
> > >>>> +
> > >>>> +osm_qos_level_t *osm_qos_policy_qos_level_create();
> > >>>> +void osm_qos_policy_qos_level_destroy();
> > >>>> +
> > >>>> +boolean_t osm_qos_level_has_pkey(IN const
> osm_qos_level_t * p_qos_level,
> > >>>> + IN ib_net16_t pkey);
> > >>>> +
> > >>>> +ib_net16_t osm_qos_level_get_shared_pkey(IN const
> > >>>> +osm_qos_level_t *
> > >>>> p_qos_level,
> > >>>> + IN const
> osm_physp_t * p_src_physp,
> > >>>> + IN const osm_physp_t *
> > >>>> + p_dest_physp);
> > >>>> +
> > >>>> +/***************************************************/
> > >>>> +
> > >>>> +typedef struct osm_qos_match_rule_t_ {
> > >>>> + char *use;
> > >>>> + cl_list_t source_list; /* list of strings */
> > >>>> + cl_list_t source_group_list; /* list of pointers
> to relevant port-group
> > >>>> */
> > >>>> + cl_list_t destination_list; /* list of strings */
> > >>>> + cl_list_t destination_group_list; /* list of
> pointers to relevant
> > >>>> port-group */
> > >>> I think you should only keep port guids there (mapped
> for fast searches).
> > >> Same as above.
> > >> I think that checking node type and then guid range array is
> > >> essentially faster than checking guid map.
> > >
> > > _Only_ for case when the group is specified by type,
> which is likely
> > > will not be typical.
> >
> > Again, in *worst* case you will get the same performance in
> searching
> > range array as in searching cl_map (which is a binary tree).
> >
> > >> You might say, of course, that there can be many port
> groups in
> > >> the same match rule, but I don't see this as a practical
> example.
> > >
> > > Of course it could (or you must disable multi groups
> here, which is
> > > not good idea I think and not what was presented in the RFC).
> >
> > Can you elaborate on this?
> > Do you think that having multiple groups is not useful at all?
> >
> > > Also each group still have this "linear searchable" lists:
> > >
> > > cl_list_t port_name_list; /* list of port
> names (.../.../...) */
> >
> > Port names are not implemented yet :)
> > There's a "todo" comment in the code.
> > This is the only keyword in the parser that is not implemented - I
> > mean it's parsed, but other than creating this list the
> parser doesn't do anything with it.
> > But It definitely won't stay as a pure list.
> >
> > > uint64_t **guid_range_arr; /* array of guid
> ranges (pair of 64-bit
> >
> > This is not just a list, as I've explained before.
> >
> > > cl_list_t partition_list; /* list of
> partition names */
> >
> > I can hardly believe than there will be more than one or two
> > partitions in this list. Do you think otherwise?
> >
> > Anyway, I can extract all the guids and prepare a map. This is
> > certainly the easiest implementation. And I still think that on
> > average, guid ranges and partition lists are better, but
> who knows -
> > perhaps someone would want to define a bunch of partitions
> in a single
> > port group and ruin my average... :)
> >
> > >>>> + char *qos_level_name;
> > >>>> + osm_qos_level_t *p_qos_level;
> > >>> Why do you need qos_level_name if you keep the pointer to this
> > >>> qos_level struct?
> > >> In policy file the match rule might appear before the
> QoS levels,
> > >> so matching qos level names to the actual qos levels is
> done when
> > >> the parsing is done.
> > >>
> > >>>> + uint64_t **service_id_range_arr; /* array of
> SID ranges (64-bit values)
> > >>>> */
> > >>>> + unsigned service_id_range_len; uint64_t
> > >>>> + **qos_class_range_arr; /* array of QoS Class ranges (real
> > >>>> values are 16bits) */
> > >>>> + unsigned qos_class_range_len;
> > >>>> + uint64_t **pkey_range_arr; /* array of PKey
> ranges (real values are
> > >>>> 16bits) */
> > >>>> + unsigned pkey_range_len;
> > >>>> +} osm_qos_match_rule_t;
> > >>>> +
> > >>>> +osm_qos_match_rule_t *osm_qos_policy_match_rule_create();
> > >>>> +void osm_qos_policy_match_rule_destroy();
> > >>>> +
> > >>>> +/***************************************************/
> > >>>> +
> > >>>> +typedef struct osm_qos_policy_t_ {
> > >>>> + cl_list_t port_groups; /* list of osm_qos_port_group_t */
> > >>>> + cl_list_t sl2vl_tables; /* list of osm_qos_sl2vl_scope_t */
> > >>>> + cl_list_t vlarb_tables; /* list of osm_qos_vlarb_scope_t */
> > >>>> + cl_list_t qos_levels; /* list of osm_qos_level_t */
> > >>>> + cl_list_t qos_match_rules; /* list of
> osm_qos_match_rule_t */
> > >>>> + osm_qos_level_t *p_default_qos_level; /* default
> QoS level */
> > >>>> +} osm_qos_policy_t;
> > >>>> +
> > >>>> +void osm_qos_policy_create();
> > >>>> +void osm_qos_policy_destroy();
> > >>>> +int osm_qos_policy_validate();
> > >>>> +
> > >>>> +void osm_qos_policy_get_qos_level_by_pr(IN const
> osm_pr_rcv_t * p_rcv,
> > >>>> + IN const
> ib_path_rec_t * p_pr,
> > >>>> + IN const
> osm_physp_t * p_src_physp,
> > >>>> + IN const
> osm_physp_t * p_dest_physp,
> > >>>> + IN ib_net64_t comp_mask,
> > >>>> + OUT osm_qos_level_t **
> > >>>> +pp_qos_level);
> > >>>> +
> > >>>> +/***************************************************/
> > >>>> +
> > >>>> +int osm_qos_parse_policy_file(IN osm_log_t * p_log, IN const
> > >>>> +char
> > >>>> *policy_file);
> > >>>> +
> > >>>> +/***************************************************/
> > >>>> +
> > >>>> +#endif /* ifndef
> OSM_QOS_POLICY_H */
> > >>>> diff --git a/opensm/opensm/osm_qos_policy.c
> > >>>> b/opensm/opensm/osm_qos_policy.c new file mode 100644 index
> > >>>> 0000000..bc2aa68
> > >>>> --- /dev/null
> > >>>> +++ b/opensm/opensm/osm_qos_policy.c
> > >>>> @@ -0,0 +1,901 @@
> > >>>> +/*
> > >>>> + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved.
> > >>>> + * Copyright (c) 2002-2005 Mellanox Technologies LTD.
> All rights
> > >>>> reserved.
> > >>>> + * Copyright (c) 1996-2003 Intel Corporation. All
> rights reserved.
> > >>>> + *
> > >>>> + * This software is available to you under a choice of one of
> > >>>> + two
> > >>>> + * licenses. You may choose to be licensed under the
> terms of
> > >>>> + the GNU
> > >>>> + * General Public License (GPL) Version 2, available from the
> > >>>> + file
> > >>>> + * COPYING in the main directory of this source tree, or the
> > >>>> + * OpenIB.org BSD license below:
> > >>>> + *
> > >>>> + * Redistribution and use in source and binary
> forms, with or
> > >>>> + * without modification, are permitted provided
> that the following
> > >>>> + * conditions are met:
> > >>>> + *
> > >>>> + * - Redistributions of source code must retain the above
> > >>>> + * copyright notice, this list of conditions
> and the following
> > >>>> + * disclaimer.
> > >>>> + *
> > >>>> + * - Redistributions in binary form must
> reproduce the above
> > >>>> + * copyright notice, this list of conditions
> and the following
> > >>>> + * disclaimer in the documentation and/or
> other materials
> > >>>> + * provided with the distribution.
> > >>>> + *
> > >>>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY
> > >>>> + KIND,
> > >>>> + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE
> > >>>> + WARRANTIES OF
> > >>>> + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
> > >>>> + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
> COPYRIGHT
> > >>>> + HOLDERS
> > >>>> + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> LIABILITY, WHETHER
> > >>>> + IN AN
> > >>>> + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> FROM, OUT OF
> > >>>> + OR IN
> > >>>> + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> DEALINGS IN
> > >>>> + THE
> > >>>> + * SOFTWARE.
> > >>>> + *
> > >>>> + */
> > >>>> +
> > >>>> +/*
> > >>>> + * Abstract:
> > >>>> + * OSM QoS Policy functions.
> > >>>> + *
> > >>>> + * Environment:
> > >>>> + * Linux User Mode
> > >>>> + *
> > >>>> + * Author:
> > >>>> + * Yevgeny Kliteynik, Mellanox
> > >>>> + */
> > >>>> +
> > >>>> +#include <opensm/osm_qos_policy.h> #include
> > >>>> +<opensm/osm_qos_parser_y.h> #include <opensm/osm_partition.h>
> > >>>> +
> > >>>> +extern void yyerror(char *s);
> > >>>> +osm_log_t *p_qos_parser_osm_log = NULL; osm_qos_policy_t
> > >>>> +*p_qos_policy = NULL;
> > >>> Please try to avoid globals - keep it as part of
> osm_opensm_t or
> > >>> osm_subn_t structures.
> > >> I thought about it, but didn't want to "condaminate" the
> > >> osm_opensm_t or osm_subn_t structures untill the QoS
> functionality is ready.
> > >
> > > How globals are better in this sense?
> >
> > They're not :)
> > The difference is that less files are modified.
> > But I agree that the QoS policy should be part of
> osm_opensm_t or osm_subn_t.
> >
> > -- Yevgeny
> >
> > > Sasha
> > >
> >
> > _______________________________________________
> > general mailing list
> > general at lists.openfabrics.org
> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> >
> > To unsubscribe, please visit
> > http://openib.org/mailman/listinfo/openib-general
> >
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-general
>
More information about the general
mailing list