[ofa-general] Re: [PATCH v2] osm: QoS: selecting PathRecord according to QoS policy
Yevgeny Kliteynik
kliteyn at dev.mellanox.co.il
Thu Sep 6 16:32:50 PDT 2007
Sasha Khapyorsky wrote:
> On 16:31 Thu 06 Sep , Yevgeny Kliteynik wrote:
>> Hi Sasha,
>>
>> Sasha Khapyorsky wrote:
>>> Hi Yevgeny,
>>> On 15:22 Wed 05 Sep , Yevgeny Kliteynik wrote:
>>>> Selecting path according to QoS policy level that
>>>> the PathRecord query matches.
>>>>
>>>> Signed-off-by: Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>
>>>> ---
>>>> opensm/opensm/osm_sa_path_record.c | 374
>>>> ++++++++++++++++++++++++++----------
>>>> 1 files changed, 276 insertions(+), 98 deletions(-)
>>>>
>>>> diff --git a/opensm/opensm/osm_sa_path_record.c
>>>> b/opensm/opensm/osm_sa_path_record.c
>>>> index 1b781f0..15bd7e2 100644
>>>> --- a/opensm/opensm/osm_sa_path_record.c
>>>> +++ b/opensm/opensm/osm_sa_path_record.c
>>>> @@ -67,6 +67,7 @@
>>>> #include <opensm/osm_multicast.h>
>>>> #include <opensm/osm_partition.h>
>>>> #include <opensm/osm_opensm.h>
>>>> +#include <opensm/osm_qos_policy.h>
>>>> #ifdef ROUTER_EXP
>>>> #include <opensm/osm_router.h>
>>>> #include <opensm/osm_sa_mcmember_record.h>
>>>> @@ -236,8 +237,9 @@ __osm_pr_rcv_get_path_parms(IN osm_pr_rcv_t * const
>>>> p_rcv,
>>>> {
>>>> const osm_node_t *p_node;
>>>> const osm_physp_t *p_physp;
>>>> + const osm_physp_t *p_src_physp;
>>>> const osm_physp_t *p_dest_physp;
>>>> - const osm_prtn_t *p_prtn;
>>>> + const osm_prtn_t *p_prtn = NULL;
>>>> const ib_port_info_t *p_pi;
>>>> ib_api_status_t status = IB_SUCCESS;
>>>> ib_net16_t pkey;
>>>> @@ -248,14 +250,24 @@ __osm_pr_rcv_get_path_parms(IN osm_pr_rcv_t * const
>>>> p_rcv,
>>>> uint8_t required_rate;
>>>> uint8_t required_pkt_life;
>>>> uint8_t sl;
>>>> + uint8_t in_port_num;
>>>> ib_net16_t dest_lid;
>>>> + uint8_t i;
>>>> + uint8_t vl;
>>>> + ib_slvl_table_t *p_slvl_tbl = NULL;
>>>> + boolean_t valid_sls[IB_MAX_NUM_VLS];
>>>> + boolean_t sl2vl_valid_path;
>>>> + uint8_t first_valid_sl;
>>>> + osm_qos_level_t *p_qos_level = NULL;
>>>>
>>>> OSM_LOG_ENTER(p_rcv->p_log, __osm_pr_rcv_get_path_parms);
>>>>
>>>> + memset(valid_sls, TRUE, IB_MAX_NUM_VLS);
>>>> dest_lid = cl_hton16(dest_lid_ho);
>>>>
>>>> p_dest_physp = p_dest_port->p_physp;
>>>> p_physp = p_src_port->p_physp;
>>>> + p_src_physp = p_physp;
>>>> p_pi = &p_physp->port_info;
>>>>
>>>> mtu = ib_port_info_get_mtu_cap(p_pi);
>>>> @@ -288,13 +300,16 @@ __osm_pr_rcv_get_path_parms(IN osm_pr_rcv_t * const
>>>> p_rcv,
>>>> p_node = osm_physp_get_node_ptr(p_physp);
>>>>
>>>> if (p_node->sw) {
>>>> + /* source node is a switch */
>>>> + in_port_num = osm_physp_get_port_num(p_physp);
>>> Hmm, could in_port_num be != 0?
>> Well...
>> The physical port object is obtained from port object, which in turn,
>> was obtained from the subnet port_guid_tbl through osm_get_port_by_guid().
>> Since there can be one port per guid in this table, I think we store there
>> only ports 0 of the switches (correct me if I'm wrong).
>> So looks like you're right - in this case in_port_num can be only 0.
>>
>> In any case, osm_physp_get_port_num() is just an inline function that
>> returns p_physp->port_num.
>
> And look where this in_port_num is used later:
>
>>>> + if (p_node->sw)
>>>> + p_slvl_tbl = osm_physp_get_slvl_tbl(p_physp, in_port_num);
>>>> + else
>>>> + p_slvl_tbl = osm_physp_get_slvl_tbl(p_physp, 0);
>>>> +
>
> Since for switches in_port_num is always 0 just
>
> p_slvl_tbl = osm_physp_get_slvl_tbl(p_physp, 0);
>
> will be sufficient for all node types.
OK
>>>> +
>>>> /*
>>>> * If the dest_lid_ho is equal to the lid of the switch pointed by
>>>> * p_sw then p_physp will be the physical port of the switch port zero.
>>> I know it is not your code, but do you understand this part of the
>>> comment?
>> Nope :)
>> The two lines I've added may very well replace these first two lines,
>> so I think I can remove the old comment.
>
> Ok.
>
>>>> + * Make sure that p_physp points to the out port of the
>>>> + * switch that routes to the destination lid (dest_lid_ho)
>>>> */
>>>> - p_physp =
>>>> - osm_switch_get_route_by_lid(p_node->sw,
>>>> - cl_ntoh16(dest_lid_ho));
>>>> + p_physp = osm_switch_get_route_by_lid(p_node->sw, dest_lid);
>>>> if (p_physp == 0) {
>>>> osm_log(p_rcv->p_log, OSM_LOG_ERROR,
>>>> "__osm_pr_rcv_get_path_parms: ERR 1F02: "
>>>> @@ -306,15 +321,32 @@ __osm_pr_rcv_get_path_parms(IN osm_pr_rcv_t * const
>>>> p_rcv,
>>>> }
>>>> }
>>>>
>>>> + if (!p_rcv->p_subn->opt.no_qos) {
>>> Would you prefer to change opt.no_qos to opt.qos? For me it looks things
>>> will be clear this way.
>> I wanted to do it since I started working on QoS!
>
> Feel free :)
>
>>>> + if (p_node->sw)
>>>> + p_slvl_tbl = osm_physp_get_slvl_tbl(p_physp, in_port_num);
>>>> + else
>>>> + p_slvl_tbl = osm_physp_get_slvl_tbl(p_physp, 0);
>>>> +
>>>> + /* update valid SLs that still exist on this route */
>>>> + for (i = 0; i < IB_MAX_NUM_VLS; i++) {
>>>> + if (valid_sls[i]) {
>>>> + vl = ib_slvl_table_get(p_slvl_tbl, i);
>>>> + if (vl == IB_DROP_VL)
>>>> + valid_sls[i] = FALSE;
>>>> + }
>>>> + }
>>>> + }
>>>> +
>>>> /*
>>>> * Same as above
>>>> */
>>>> p_node = osm_physp_get_node_ptr(p_dest_physp);
>>>>
>>>> if (p_node->sw) {
>>>> - p_dest_physp =
>>>> - osm_switch_get_route_by_lid(p_node->sw,
>>>> - cl_ntoh16(dest_lid_ho));
>>>> + /*
>>>> + * if destination is switch, we want p_dest_physp to point to port 0
>>>> + */
>>>> + p_dest_physp = osm_switch_get_route_by_lid(p_node->sw, dest_lid);
>>>>
>>>> if (p_dest_physp == 0) {
>>>> osm_log(p_rcv->p_log, OSM_LOG_ERROR,
>>>> @@ -328,6 +360,10 @@ __osm_pr_rcv_get_path_parms(IN osm_pr_rcv_t * const
>>>> p_rcv,
>>>>
>>>> }
>>>>
>>>> + /*
>>>> + * Now go through the path step by step
>>>> + */
>>>> +
>>>> while (p_physp != p_dest_physp) {
>>>> p_physp = osm_physp_get_remote(p_physp);
>>>>
>>>> @@ -341,6 +377,8 @@ __osm_pr_rcv_get_path_parms(IN osm_pr_rcv_t * const
>>>> p_rcv,
>>>> goto Exit;
>>>> }
>>>>
>>>> + in_port_num = osm_physp_get_port_num(p_physp);
>>>> +
>>>> /*
>>>> This is point to point case (no switch in between)
>>>> */
>>>> @@ -367,29 +405,11 @@ __osm_pr_rcv_get_path_parms(IN osm_pr_rcv_t * const
>>>> p_rcv,
>>>> */
>>>> p_pi = &p_physp->port_info;
>>>>
>>>> - if (mtu > ib_port_info_get_mtu_cap(p_pi)) {
>>>> + if (mtu > ib_port_info_get_mtu_cap(p_pi))
>>>> mtu = ib_port_info_get_mtu_cap(p_pi);
>>>> - if (osm_log_is_active(p_rcv->p_log, OSM_LOG_DEBUG))
>>>> - osm_log(p_rcv->p_log, OSM_LOG_DEBUG,
>>>> - "__osm_pr_rcv_get_path_parms: "
>>>> - "New smallest MTU = %u at intervening port 0x%016"
>>>> - PRIx64 " port num 0x%X\n", mtu,
>>>> - cl_ntoh64(osm_physp_get_port_guid
>>>> - (p_physp)),
>>>> - osm_physp_get_port_num(p_physp));
>>>> - }
>>>>
>>>> - if (rate > ib_port_info_compute_rate(p_pi)) {
>>>> + if (rate > ib_port_info_compute_rate(p_pi))
>>>> rate = ib_port_info_compute_rate(p_pi);
>>>> - if (osm_log_is_active(p_rcv->p_log, OSM_LOG_DEBUG))
>>>> - osm_log(p_rcv->p_log, OSM_LOG_DEBUG,
>>>> - "__osm_pr_rcv_get_path_parms: "
>>>> - "New smallest rate = %u at intervening port 0x%016"
>>>> - PRIx64 " port num 0x%X\n", rate,
>>>> - cl_ntoh64(osm_physp_get_port_guid
>>>> - (p_physp)),
>>>> - osm_physp_get_port_num(p_physp));
>>>> - }
>>>>
>>>> /*
>>>> Continue with the egress port on this switch.
>>>> @@ -409,32 +429,41 @@ __osm_pr_rcv_get_path_parms(IN osm_pr_rcv_t * const
>>>> p_rcv,
>>>> CL_ASSERT(p_physp);
>>> It is not needed, run-time check is done right above. (I know it is not
>>> your code)
>> Sure - removed.
>>
>>>> CL_ASSERT(osm_physp_is_valid(p_physp));
>>>>
>>>> + p_node = osm_physp_get_node_ptr(p_physp);
>>>> + if (!p_node->sw) {
>>> Actually this !p_node->sw check duplicates the one above, where
>>> !p_node->sw is verified for ergess port of this switch. Right?
Well, it's not exactly the same check - one check is for egress port,
the other is for ingress port, but we certainly can live with checking
only one of the two ports.
>>>> + /*
>>>> + * There is some sort of problem in the subnet object!
>>>> + * If this isn't a switch, we should have reached
>>>> + * the destination by now!
>>>> + */
>>>> + osm_log(p_rcv->p_log, OSM_LOG_ERROR,
>>>> + "__osm_pr_rcv_get_path_parms: ERR 1F04: "
>>>> + "Internal error, bad path\n");
>>>> + status = IB_ERROR;
>>>> + goto Exit;
>>>> + }
>>>> +
>>>> p_pi = &p_physp->port_info;
>>>>
>>>> - if (mtu > ib_port_info_get_mtu_cap(p_pi)) {
>>>> + if (mtu > ib_port_info_get_mtu_cap(p_pi))
>>>> mtu = ib_port_info_get_mtu_cap(p_pi);
>>>> - if (osm_log_is_active(p_rcv->p_log, OSM_LOG_DEBUG))
>>>> - osm_log(p_rcv->p_log, OSM_LOG_DEBUG,
>>>> - "__osm_pr_rcv_get_path_parms: "
>>>> - "New smallest MTU = %u at intervening port 0x%016"
>>>> - PRIx64 " port num 0x%X\n", mtu,
>>>> - cl_ntoh64(osm_physp_get_port_guid
>>>> - (p_physp)),
>>>> - osm_physp_get_port_num(p_physp));
>>>> - }
>>>>
>>>> - if (rate > ib_port_info_compute_rate(p_pi)) {
>>>> + if (rate > ib_port_info_compute_rate(p_pi))
>>>> rate = ib_port_info_compute_rate(p_pi);
>>>> - if (osm_log_is_active(p_rcv->p_log, OSM_LOG_DEBUG))
>>>> - osm_log(p_rcv->p_log, OSM_LOG_DEBUG,
>>>> - "__osm_pr_rcv_get_path_parms: "
>>>> - "New smallest rate = %u at intervening port 0x%016"
>>>> - PRIx64 " port num 0x%X\n", rate,
>>>> - cl_ntoh64(osm_physp_get_port_guid
>>>> - (p_physp)),
>>>> - osm_physp_get_port_num(p_physp));
>>>> - }
>>>>
>>>> + if (!p_rcv->p_subn->opt.no_qos) {
>>>> + /*
>>>> + * Check SL2VL table of the switch and update valid SLs
>>>> + */
>>>> + p_slvl_tbl = osm_physp_get_slvl_tbl(p_physp, in_port_num);
>>>> + for (i = 0; i < IB_MAX_NUM_VLS; i++) {
>>>> + if (valid_sls[i]) {
>>>> + vl = ib_slvl_table_get(p_slvl_tbl, i);
>>>> + if (vl == IB_DROP_VL)
>>>> + valid_sls[i] = FALSE;
>>>> + }
>>>> + }
>>>> + }
>>>> }
>>>>
>>>> /*
>>>> @@ -442,30 +471,104 @@ __osm_pr_rcv_get_path_parms(IN osm_pr_rcv_t * const
>>>> p_rcv,
>>>> */
>>>> p_pi = &p_physp->port_info;
>>>>
>>>> - if (mtu > ib_port_info_get_mtu_cap(p_pi)) {
>>>> + if (mtu > ib_port_info_get_mtu_cap(p_pi))
>>>> mtu = ib_port_info_get_mtu_cap(p_pi);
>>>> - if (osm_log_is_active(p_rcv->p_log, OSM_LOG_DEBUG))
>>>> +
>>>> + if (rate > ib_port_info_compute_rate(p_pi))
>>>> + rate = ib_port_info_compute_rate(p_pi);
>>>> +
>>>> + if (osm_log_is_active(p_rcv->p_log, OSM_LOG_DEBUG))
>>>> + osm_log(p_rcv->p_log, OSM_LOG_DEBUG,
>>>> + "__osm_pr_rcv_get_path_parms: "
>>>> + "Path min MTU = %u, min rate = %u\n",
>>>> + mtu, rate);
>>>> +
>>>> + if (!p_rcv->p_subn->opt.no_qos) {
>>>> + /*
>>>> + * check whether there is some SL
>>>> + * that won't lead to VL15 eventually
>>>> + */
>>>> + sl2vl_valid_path = FALSE;
>>>> + for (i = 0; i < IB_MAX_NUM_VLS; i++) {
>>>> + if (valid_sls[i]) {
>>>> + sl2vl_valid_path = TRUE;
>>>> + first_valid_sl = i;
>>>> + break;
>>>> + }
>>>> + }
>>>> +
>>>> + if (!sl2vl_valid_path) {
>>>> + if (osm_log_is_active(p_rcv->p_log, OSM_LOG_DEBUG)) {
>>>> + osm_log(p_rcv->p_log, OSM_LOG_DEBUG,
>>>> + "__osm_pr_rcv_get_path_parms: "
>>>> + "All the SLs lead to VL15 on this path\n");
>>>> + }
>>>> + status = IB_NOT_FOUND;
>>>> + goto Exit;
>>>> + }
>>>> + }
>>>> +
>>>> + if (!p_rcv->p_subn->opt.no_qos && p_rcv->p_subn->p_qos_policy) {
>>>> + /* Get QoS Level object according to the path request */
>>>> + osm_qos_policy_get_qos_level_by_pr(p_rcv->p_subn->p_qos_policy,
>>>> + p_rcv, p_pr,
>>>> + p_src_physp, p_dest_physp,
>>>> + comp_mask, &p_qos_level);
>>>> +
>>>> + if (p_qos_level
>>>> + && osm_log_is_active(p_rcv->p_log, OSM_LOG_DEBUG)) {
>>>> osm_log(p_rcv->p_log, OSM_LOG_DEBUG,
>>>> "__osm_pr_rcv_get_path_parms: "
>>>> - "New smallest MTU = %u at destination port 0x%016"
>>>> - PRIx64 "\n", mtu,
>>>> - cl_ntoh64(osm_physp_get_port_guid(p_physp)));
>>>> + "PathRecord request matches QoS Level '%s' (%s)\n",
>>>> + p_qos_level->name,
>>>> + (p_qos_level->use) ? p_qos_level->
>>>> + use : "no description");
>>>> + }
>>>> }
>>>>
>>>> - if (rate > ib_port_info_compute_rate(p_pi)) {
>>>> - rate = ib_port_info_compute_rate(p_pi);
>>>> + /* Adjust path parameters according to QoS settings */
>>>> +
>>>> + if (p_qos_level) {
>>> Why to not make osm_qos_policy_get_qos_level_by_pr() returning pointer
>>> to p_qos_level? Then you could simply merge both conditions (this and
>>> one above), something like:
>>> if (!p_rcv->p_subn->opt.no_qos &&
>>> p_rcv->p_subn->p_qos_policy &&
>>> (p_qos_level = osm_qos_policy_get_qos_level_by_pr(..)) {
>> Done.
>>
>>>> + if (p_qos_level->mtu_limit_set
>>>> + && (mtu > p_qos_level->mtu_limit))
>>>> + mtu = p_qos_level->mtu_limit;
>>>> +
>>>> + if (p_qos_level->rate_limit_set
>>>> + && (rate > p_qos_level->rate_limit))
>>>> + rate = p_qos_level->rate_limit;
>>>> +
>>>> + if (p_qos_level->pkt_life_set
>>>> + && (pkt_life > p_qos_level->pkt_life))
>>>> + pkt_life = p_qos_level->pkt_life;
>>>> +
>>>> + if (p_qos_level->sl_set) {
>>>> + if (!valid_sls[p_qos_level->sl]) {
>>>> + status = IB_NOT_FOUND;
>>>> + goto Exit;
>>>> + }
>>>> + sl = p_qos_level->sl;
>>>> + }
>>>> +
>>>> if (osm_log_is_active(p_rcv->p_log, OSM_LOG_DEBUG))
>>>> osm_log(p_rcv->p_log, OSM_LOG_DEBUG,
>>>> "__osm_pr_rcv_get_path_parms: "
>>>> - "New smallest rate = %u at destination port 0x%016"
>>>> - PRIx64 "\n", rate,
>>>> - cl_ntoh64(osm_physp_get_port_guid(p_physp)));
>>>> + "Path params with QoS constaraints: "
>>>> + "min MTU = %u, min rate = %u, "
>>>> + "packet lifetime = %u, sl = %u\n",
>>>> + mtu, rate, pkt_life, sl);
>>>> }
>>>>
>>>> - if (osm_log_is_active(p_rcv->p_log, OSM_LOG_DEBUG))
>>>> - osm_log(p_rcv->p_log, OSM_LOG_DEBUG,
>>>> - "__osm_pr_rcv_get_path_parms: "
>>>> - "Path min MTU = %u, min rate = %u\n", mtu, rate);
>>>> + /*
>>>> + * Set packet lifetime.
>>>> + * According to spec definition IBA 1.2 Table 205
>>>> + * PacketLifeTime description, for loopback paths,
>>>> + * packetLifeTime shall be zero.
>>>> + */
>>>> + if (p_src_port == p_dest_port)
>>>> + pkt_life = 0;
>>>> + else if ( !(p_qos_level && p_qos_level->pkt_life_set) )
>>>> + pkt_life = OSM_DEFAULT_SUBNET_TIMEOUT;
>>>> +
>>>>
>>>> /*
>>>> Determine if these values meet the user criteria
>>>> @@ -511,6 +614,8 @@ __osm_pr_rcv_get_path_parms(IN osm_pr_rcv_t * const
>>>> p_rcv,
>>>> break;
>>>> }
>>>> }
>>>> + if (status != IB_SUCCESS)
>>>> + goto Exit;
>>>>
>>>> /* we silently ignore cases where only the Rate selector is defined */
>>>> if ((comp_mask & IB_PR_COMPMASK_RATESELEC) &&
>>>> @@ -551,14 +656,8 @@ __osm_pr_rcv_get_path_parms(IN osm_pr_rcv_t * const
>>>> p_rcv,
>>>> break;
>>>> }
>>>> }
>>>> -
>>>> - /* Verify the pkt_life_time */
>>>> - /* According to spec definition IBA 1.2 Table 205 PacketLifeTime
>>>> description,
>>>> - for loopback paths, packetLifeTime shall be zero. */
>>>> - if (p_src_port == p_dest_port)
>>>> - pkt_life = 0; /* loopback */
>>>> - else
>>>> - pkt_life = OSM_DEFAULT_SUBNET_TIMEOUT;
>>>> + if (status != IB_SUCCESS)
>>>> + goto Exit;
>>>>
>>>> /* we silently ignore cases where only the PktLife selector is defined
>>>> */
>>>> if ((comp_mask & IB_PR_COMPMASK_PKTLIFETIMESELEC) &&
>>>> @@ -603,12 +702,24 @@ __osm_pr_rcv_get_path_parms(IN osm_pr_rcv_t * const
>>>> p_rcv,
>>>> if (status != IB_SUCCESS)
>>>> goto Exit;
>>>>
>>>> - if (comp_mask & IB_PR_COMPMASK_RAWTRAFFIC &&
>>>> - cl_ntoh32(p_pr->hop_flow_raw) & (1 << 31))
>>>> - pkey = osm_physp_find_common_pkey(p_physp, p_dest_physp);
>>>> + /*
>>>> + * set Pkey for this path record request
>>>> + */
>>>> +
>>>> + if ((comp_mask & IB_PR_COMPMASK_RAWTRAFFIC) &&
>>>> + (cl_ntoh32(p_pr->hop_flow_raw) & (1 << 31)))
>>>> + pkey = osm_physp_find_common_pkey(p_src_physp, p_dest_physp);
>>> So is it was bug (not related to QoS) when p_physp instead of
>>> p_src_physp was used for pkey finding?
>> I think so.
>
> Nice finding!
>
>>>> +
>>>> else if (comp_mask & IB_PR_COMPMASK_PKEY) {
>>>> + /*
>>>> + * PR request has a specific pkey:
>>>> + * Check that source and destination share this pkey.
>>>> + * If QoS level has pkeys, check that this pkey exists
>>>> + * in the QoS level pkeys.
>>>> + * PR returned pkey is the requested pkey.
>>>> + */
>>>> pkey = p_pr->pkey;
>>>> - if (!osm_physp_share_this_pkey(p_physp, p_dest_physp, pkey)) {
>>>> + if (!osm_physp_share_this_pkey(p_src_physp, p_dest_physp, pkey)) {
>>>> osm_log(p_rcv->p_log, OSM_LOG_ERROR,
>>>> "__osm_pr_rcv_get_path_parms: ERR 1F1A: "
>>>> "Ports do not share specified PKey 0x%04x\n",
>>>> @@ -616,8 +727,37 @@ __osm_pr_rcv_get_path_parms(IN osm_pr_rcv_t * const
>>>> p_rcv,
>>>> status = IB_NOT_FOUND;
>>>> goto Exit;
>>>> }
>>>> + if (p_qos_level && p_qos_level->pkey_range_len &&
>>>> + !osm_qos_level_has_pkey(p_qos_level, pkey)) {
>>>> + osm_log(p_rcv->p_log, OSM_LOG_ERROR,
>>>> + "__osm_pr_rcv_get_path_parms: ERR 1F1D: "
>>>> + "Ports do not share PKeys defined by QoS level\n");
>>>> + status = IB_NOT_FOUND;
>>>> + goto Exit;
>>>> + }
>>>> +
>>>> + } else if (p_qos_level && p_qos_level->pkey_range_len) {
>>>> + /*
>>>> + * PR request doesn't have a specific pkey, but QoS level
>>>> + * has pkeys - get shared pkey from QoS level pkeys
>>>> + */
>>>> + pkey = osm_qos_level_get_shared_pkey(p_qos_level,
>>>> + p_src_physp,
>>>> + p_dest_physp);
>>>> + if (!pkey) {
>>>> + osm_log(p_rcv->p_log, OSM_LOG_ERROR,
>>>> + "__osm_pr_rcv_get_path_parms: ERR 1F1E: "
>>>> + "Ports do not share PKeys defined by QoS level\n");
>>>> + status = IB_NOT_FOUND;
>>>> + goto Exit;
>>>> + }
>>>> } else {
>>>> - pkey = osm_physp_find_common_pkey(p_physp, p_dest_physp);
>>>> + /*
>>>> + * Neither PR request nor QoS level have pkey.
>>>> + * Just get any shared pkey.
>>>> + */
>>>> + pkey = osm_physp_find_common_pkey(p_src_physp,
>>>> + p_dest_physp);
>>>> if (!pkey) {
>>>> osm_log(p_rcv->p_log, OSM_LOG_ERROR,
>>>> "__osm_pr_rcv_get_path_parms: ERR 1F1B: "
>>>> @@ -627,14 +767,6 @@ __osm_pr_rcv_get_path_parms(IN osm_pr_rcv_t * const
>>>> p_rcv,
>>>> }
>>>> }
>>>>
>>>> - if (p_rcv->p_subn->opt.routing_engine_name &&
>>>> - strcmp(p_rcv->p_subn->opt.routing_engine_name, "lash") == 0)
>>>> - /* slid and dest_lid are stored in network in lash */
>>>> - sl = osm_get_lash_sl(p_rcv->p_subn->p_osm, p_src_port,
>>>> - p_dest_port);
>>>> - else
>>>> - sl = OSM_DEFAULT_SL;
>>>> -
>>>> if (pkey) {
>>>> p_prtn =
>>>> (osm_prtn_t *) cl_qmap_get(&p_rcv->p_subn->prtn_pkey_tbl,
>>>> @@ -642,34 +774,80 @@ __osm_pr_rcv_get_path_parms(IN osm_pr_rcv_t * const
>>>> p_rcv,
>>>> 0x8000));
>>>> if (p_prtn ==
>>>> (osm_prtn_t *) cl_qmap_end(&p_rcv->p_subn->prtn_pkey_tbl))
>>>> + p_prtn = NULL;
>>>> + }
>>>> +
>>>> + /*
>>>> + * Set PathRecord SL.
>>>> + *
>>>> + * ToDo: What about QoS and LASH routing? How can they coexist?
>>>> + * And what happens when there's a pkey, hence there is a
>>>> + * partition with a certain SL, and this SL doesn't match
>>>> + * the one that's defined by LASH?
>>>> + */
>>>> +
>>>> + if (comp_mask & IB_PR_COMPMASK_SL) {
>>>> + /*
>>>> + * Specific SL was requested
>>>> + */
>>>> + sl = ib_path_rec_sl(p_pr);
>>>> + if (p_qos_level && p_qos_level->sl_set && (p_qos_level->sl != sl)) {
>>>> + osm_log(p_rcv->p_log, OSM_LOG_ERROR,
>>>> + "__osm_pr_rcv_get_path_parms: ERR 1F1F: "
>>>> + "QoS constaraints: required PR SL (%u) "
>>>> + "doesn't match QoS SL (%u)\n",
>>>> + sl, p_qos_level->sl);
>>>> + status = IB_NOT_FOUND;
>>>> + goto Exit;
>>>> + }
>>>> + } else if (p_qos_level && p_qos_level->sl_set) {
>>>> + /*
>>>> + * No specific SL was requested,
>>>> + * but there is an SL in QoS level
>>>> + */
>>>> + sl = p_qos_level->sl;
>>>> + if (pkey && p_prtn && p_prtn->sl != p_qos_level->sl)
>>>> + osm_log(p_rcv->p_log, OSM_LOG_DEBUG,
>>>> + "__osm_pr_rcv_get_path_parms: "
>>>> + "QoS level SL (%u) overrides partition SL (%u)\n",
>>>> + p_qos_level->sl, p_prtn->sl);
>>>> + } else if (pkey) {
>>>> + /*
>>>> + * No specific SL in request or in QoS level - use partition SL
>>>> + */
>>>> + if (!p_prtn) {
>>>> /* this may be possible when pkey tables are created somehow in
>>>> previous runs or things are going wrong here */
>>>> osm_log(p_rcv->p_log, OSM_LOG_ERROR,
>>>> "__osm_pr_rcv_get_path_parms: ERR 1F1C: "
>>>> "No partition found for PKey 0x%04x - using default SL %d\n",
>>>> cl_ntoh16(pkey), sl);
>>>> - else {
>>>> - if (p_rcv->p_subn->opt.routing_engine_name &&
>>>> - strcmp(p_rcv->p_subn->opt.routing_engine_name,
>>>> - "lash") == 0)
>>>> - /* slid and dest_lid are stored in network in lash */
>>>> - sl = osm_get_lash_sl(p_rcv->p_subn->p_osm,
>>>> - p_src_port, p_dest_port);
>>>> - else
>>>> - sl = p_prtn->sl;
>>>> - }
>>>> -
>>>> - /* reset pkey when raw traffic */
>>>> - if (comp_mask & IB_PR_COMPMASK_RAWTRAFFIC &&
>>>> - cl_ntoh32(p_pr->hop_flow_raw) & (1 << 31))
>>>> - pkey = 0;
>>>> + } else
>>>> + sl = p_prtn->sl;
>>>> + } else if (p_rcv->p_subn->opt.routing_engine_name &&
>>>> + strcmp(p_rcv->p_subn->opt.routing_engine_name, "lash") == 0) {
>>> It seems that in original code LASH was "higher" priority in SL
>>> selection than partition configuration? If so, any reason why it is
>>> changed?
>> No particular reason - it just seemed right at the moment.
>> I'll rework it so that the relative priorities of partition
>> and lash routing will remain as they were before.
>> In any case, is there any particular reason why lash SL
>> should have higher priority than partition's SL?
>
> I think so, LASH can be turn on or off just by using command line
> option, in order to prevent conflicting with partitions it may be
> needed to rewrite partitions config file each time when we want to run
> LASH. I think original "priorities" were fine.
OK
-- Yevgeny
>> Regardless what the answer is, there'll be a conflict when a
>> specific pkey was requested in PathRecord and this partition
>> has SL different from what lash defines.
>
> Yes, of course - LASH requires better integration, not just with
> partitions, with QoS too. Want to fix this as well? :)
>
> Sasha
>
More information about the general
mailing list