[openib-general] [PATCH 10/10] osm: QoS in OpenSM
Yevgeny Kliteynik
kliteyn at dev.mellanox.co.il
Thu Feb 1 07:57:42 PST 2007
Hi Hal,
Hal Rosenstock wrote:
> Hi again Yevgeny,
>
> On Wed, 2007-01-31 at 17:19, Yevgeny Kliteynik wrote:
>
> [snip...]
>
>>>> + for (i = 0; i < IB_MAX_NUM_VLS; i++)
>>>> + {
>>>> + if (valid_sls[i])
>>>> + {
>>>> + vl = ib_slvl_table_get(p_slvl_tbl,i);
>>>> + if (vl == IB_DROP_VL)
>>> Does vl > Operational VLs need checking here or is it never set this way
>>> ?
>> I think that it would be better if the "setup" part would check it when
>> configuring sl2vl tables, and when VL > Operational VL it should set
>> some default value instead (VL15 looks as a good option).
>
> OK; but why scan all VLs if they are not supported ?
Agree, adding it to my ToDo list of improvements in QoS.
>>>> + valid_sls[i] = FALSE;
>>>> + }
>>>> + }
>>>> +
>>>> + /*
>>>> + * now get pointer to the destination port (same as above)
>>>> + */
>>>> + p_node = osm_physp_get_node_ptr( p_dest_physp );
>>>> +
>>>> + if( p_node->sw )
>>>> + {
>>>> + p_dest_physp = osm_switch_get_route_by_lid( p_node->sw, cl_ntoh16( dest_lid_ho ) );
>>>> + if ( p_dest_physp == 0 )
>>>> + {
>>>> + osm_log( p_rcv->p_log, OSM_LOG_ERROR,
>>>> + "__osm_pr_rcv_get_path_parms_qos: ERR 1F03: "
>>>> + "Cannot find routing to LID 0x%X from switch for GUID 0x%016" PRIx64 "\n",
>>>> + dest_lid_ho,
>>>> + cl_ntoh64( osm_node_get_node_guid( p_node ) ) );
>>>> + status = IB_ERROR;
>>>> + goto Exit;
>>>> + }
>>>> + }
>>>> +
>>>> + /*
>>>> + * Now go through the path step by step
>>>> + */
>>>> +
>>>> + while( p_physp != p_dest_physp )
>>>> + {
>>>> + p_physp = osm_physp_get_remote( p_physp );
>>>> + if ( p_physp == 0 )
>>>> + {
>>>> + osm_log( p_rcv->p_log, OSM_LOG_ERROR,
>>>> + "__osm_pr_rcv_get_path_parms_qos: ERR 1F04: "
>>>> + "Cannot find remote phys port when routing to LID 0x%X from node GUID 0x%016" PRIx64 "\n",
>>>> + dest_lid_ho,
>>>> + cl_ntoh64( osm_node_get_node_guid( p_node ) ) );
>>>> + status = IB_ERROR;
>>>> + goto Exit;
>>>> + }
>>>> +
>>>> + in_port_num = osm_physp_get_port_num(p_physp);
>>>> +
>>>> + /* this is point to point case (no switch in between) */
>>>> + if( p_physp == p_dest_physp )
>>>> + break;
>>>
>>> Ordering of check for switch and point to point case are different here
>>> and original routine. Should they be the same ? If so, which should
>>> change ? (Any reason why this was moved in this routine ?)
>> Not sure I'm following.
>> The order of check for switch and point to point case looks the same
>> to me (am I missing something?). The difference that I see is that
>> the mtu and rate in the original function are adjusted after the
>> check for switch, and in the new function they are adjusted before the
>> check, which I think is the same.
>
> That could have been what I was seeing. Shouldn't the two functions be
> indentical in order (assuming these are to be separated) ? I wouldn't
> want to see them diverge further.
The order in the new function can be changed to match the order in the
old one - I have no problem with that.
> [snip...]
>
>>>> +/**********************************************************************
>>>> + **********************************************************************/
>>>> static void
>>>> __osm_pr_rcv_build_pr(
>>>> IN osm_pr_rcv_t* const p_rcv,
>>>> @@ -774,7 +1569,8 @@ __osm_pr_rcv_build_pr(
>>>> #endif
>>>>
>>>> p_pr->pkey = p_parms->pkey;
>>>> - p_pr->sl = cl_hton16(p_parms->sl);
>>>> + ib_path_rec_set_qos_class(p_pr,p_parms->class);
>>>> + ib_path_rec_set_sl(p_pr,p_parms->sl);
>>>> p_pr->mtu = (uint8_t)(p_parms->mtu | 0x80);
>>>> p_pr->rate = (uint8_t)(p_parms->rate | 0x80);
>>>>
>>>> @@ -832,10 +1628,14 @@ __osm_pr_rcv_get_lid_pair_path(
>>>> goto Exit;
>>>> }
>>>>
>>>> - status = __osm_pr_rcv_get_path_parms( p_rcv, p_pr, p_src_port,
>>>> - p_dest_port, dest_lid_ho,
>>>> - comp_mask, &path_parms );
>>>> -
>>>> + if (p_rcv->p_subn->opt.no_qos)
>>> Shouldn't this be based on p_rcv->p_subn.opt.qos_policy_file rather than
>>> no_qos ? I think there are cases where the QoS will be used without the
>>> QoS policy (higher level QoS support).
>> By totally ignoring sl2vl tables the original function may return
>> path that isn't a "real" path - it may lead to VL15 at some point.
>> So the new function takes care of this problem.
>
> So it's a bug fix (missing functionality) in the existing QoS support.
Right. Hopefully, the new function will replace the old one, and there
won't be a need to add this functionality to the old function as a separate
task.
>> When there's no policy file, the policy parse tree is empty, and then
>> the ports would not have any qos-level to be applied on the examined path.
>> In that case the new function does whatever the old one did, plus checking
>> the path for sl2vl "consistency".
>
> Got it. Thanks.
>
> -- Hal
>
>> -- Yevgeny
>
>
More information about the general
mailing list