[openib-general] [PATCH 10/10] osm: QoS in OpenSM

Yevgeny Kliteynik kliteyn at dev.mellanox.co.il
Thu Feb 1 07:57:42 PST 2007


Hi Hal,

Hal Rosenstock wrote:
> Hi again Yevgeny,
> 
> On Wed, 2007-01-31 at 17:19, Yevgeny Kliteynik wrote:
> 
> [snip...]
> 
>>>> +   for (i = 0; i < IB_MAX_NUM_VLS; i++)
>>>> +   {
>>>> +      if (valid_sls[i])
>>>> +      {
>>>> +         vl = ib_slvl_table_get(p_slvl_tbl,i);
>>>> +         if (vl == IB_DROP_VL)
>>> Does vl > Operational VLs need checking here or is it never set this way
>>> ?
>> I think that it would be better if the "setup" part would check it when
>> configuring sl2vl tables, and when VL > Operational VL it should set
>> some default value instead (VL15 looks as a good option).
> 
> OK; but why scan all VLs if they are not supported ?

Agree, adding it to my ToDo list of improvements in QoS.
 
>>>> +            valid_sls[i] = FALSE;
>>>> +      }
>>>> +   }
>>>> +
>>>> +   /*
>>>> +    * now get pointer to the destination port (same as above)
>>>> +    */
>>>> +   p_node = osm_physp_get_node_ptr( p_dest_physp );
>>>> +
>>>> +   if( p_node->sw )
>>>> +   {
>>>> +      p_dest_physp = osm_switch_get_route_by_lid( p_node->sw, cl_ntoh16( dest_lid_ho ) );
>>>> +      if ( p_dest_physp == 0 )
>>>> +      {
>>>> +         osm_log( p_rcv->p_log, OSM_LOG_ERROR,
>>>> +                  "__osm_pr_rcv_get_path_parms_qos: ERR 1F03: "
>>>> +                  "Cannot find routing to LID 0x%X from switch for GUID 0x%016" PRIx64 "\n",
>>>> +                  dest_lid_ho,
>>>> +                  cl_ntoh64( osm_node_get_node_guid( p_node ) ) );
>>>> +         status = IB_ERROR;
>>>> +         goto Exit;
>>>> +      }
>>>> +   }
>>>> +
>>>> +   /*
>>>> +    * Now go through the path step by step
>>>> +    */
>>>> +
>>>> +   while( p_physp != p_dest_physp )
>>>> +   {
>>>> +      p_physp = osm_physp_get_remote( p_physp );
>>>> +      if ( p_physp == 0 )
>>>> +      {
>>>> +         osm_log( p_rcv->p_log, OSM_LOG_ERROR,
>>>> +                  "__osm_pr_rcv_get_path_parms_qos: ERR 1F04: "
>>>> +                  "Cannot find remote phys port when routing to LID 0x%X from node GUID 0x%016" PRIx64 "\n",
>>>> +                  dest_lid_ho,
>>>> +                  cl_ntoh64( osm_node_get_node_guid( p_node ) ) );
>>>> +         status = IB_ERROR;
>>>> +         goto Exit;
>>>> +      }
>>>> +      
>>>> +      in_port_num = osm_physp_get_port_num(p_physp);
>>>> +
>>>> +      /* this is point to point case (no switch in between) */
>>>> +      if( p_physp == p_dest_physp )
>>>> +         break;
>>>
>>> Ordering of check for switch and point to point case are different here
>>> and original routine. Should they be the same ? If so, which should
>>> change ? (Any reason why this was moved in this routine ?)
>> Not sure I'm following.
>> The order of check for switch and point to point case looks the same
>> to me (am I missing something?). The difference that I see is that 
>> the mtu and rate in the original function are adjusted after the 
>> check for switch, and in the new function they are adjusted before the
>> check, which I think is the same.
> 
> That could have been what I was seeing. Shouldn't the two functions be
> indentical in order (assuming these are to be separated) ? I wouldn't
> want to see them diverge further.

The order in the new function can be changed to match the order in the 
old one - I have no problem with that.

> [snip...]
> 
>>>> +/**********************************************************************
>>>> + **********************************************************************/
>>>>  static void
>>>>  __osm_pr_rcv_build_pr(
>>>>    IN osm_pr_rcv_t*         const p_rcv,
>>>> @@ -774,7 +1569,8 @@ __osm_pr_rcv_build_pr(
>>>>  #endif
>>>>  
>>>>    p_pr->pkey = p_parms->pkey;
>>>> -  p_pr->sl = cl_hton16(p_parms->sl);
>>>> +  ib_path_rec_set_qos_class(p_pr,p_parms->class);
>>>> +  ib_path_rec_set_sl(p_pr,p_parms->sl);
>>>>    p_pr->mtu = (uint8_t)(p_parms->mtu | 0x80);
>>>>    p_pr->rate = (uint8_t)(p_parms->rate | 0x80);
>>>>  
>>>> @@ -832,10 +1628,14 @@ __osm_pr_rcv_get_lid_pair_path(
>>>>      goto Exit;
>>>>    }
>>>>  
>>>> -  status = __osm_pr_rcv_get_path_parms( p_rcv, p_pr, p_src_port,
>>>> -                                        p_dest_port, dest_lid_ho,
>>>> -                                        comp_mask, &path_parms );
>>>> -
>>>> +  if (p_rcv->p_subn->opt.no_qos)
>>> Shouldn't this be based on p_rcv->p_subn.opt.qos_policy_file rather than
>>> no_qos ? I think there are cases where the QoS will be used without the
>>> QoS policy (higher level QoS support).
>> By totally ignoring sl2vl tables the original function may return
>> path that isn't a "real" path - it may lead to VL15 at some point.
>> So the new function takes care of this problem.
> 
> So it's a bug fix (missing functionality) in the existing QoS support.

Right. Hopefully, the new function will replace the old one, and there
won't be a need to add this functionality to the old function as a separate
task.
 
>> When there's no policy file, the policy parse tree is empty, and then 
>> the ports would not have any qos-level to be applied on the examined path.
>> In that case the new function does whatever the old one did, plus checking
>> the path for sl2vl "consistency".
> 
> Got it. Thanks.
> 
> -- Hal
> 
>> -- Yevgeny
> 
> 




More information about the general mailing list