[openib-general] [PATCH 10/10] osm: QoS in OpenSM

Hal Rosenstock halr at voltaire.com
Thu Feb 1 07:45:34 PST 2007


Hi again Yevgeny,

On Wed, 2007-01-31 at 17:19, Yevgeny Kliteynik wrote:

[snip...]

> >> +   for (i = 0; i < IB_MAX_NUM_VLS; i++)
> >> +   {
> >> +      if (valid_sls[i])
> >> +      {
> >> +         vl = ib_slvl_table_get(p_slvl_tbl,i);
> >> +         if (vl == IB_DROP_VL)
> > 
> > Does vl > Operational VLs need checking here or is it never set this way
> > ?
> I think that it would be better if the "setup" part would check it when
> configuring sl2vl tables, and when VL > Operational VL it should set
> some default value instead (VL15 looks as a good option).

OK; but why scan all VLs if they are not supported ?

> >> +            valid_sls[i] = FALSE;
> >> +      }
> >> +   }
> >> +
> >> +   /*
> >> +    * now get pointer to the destination port (same as above)
> >> +    */
> >> +   p_node = osm_physp_get_node_ptr( p_dest_physp );
> >> +
> >> +   if( p_node->sw )
> >> +   {
> >> +      p_dest_physp = osm_switch_get_route_by_lid( p_node->sw, cl_ntoh16( dest_lid_ho ) );
> >> +      if ( p_dest_physp == 0 )
> >> +      {
> >> +         osm_log( p_rcv->p_log, OSM_LOG_ERROR,
> >> +                  "__osm_pr_rcv_get_path_parms_qos: ERR 1F03: "
> >> +                  "Cannot find routing to LID 0x%X from switch for GUID 0x%016" PRIx64 "\n",
> >> +                  dest_lid_ho,
> >> +                  cl_ntoh64( osm_node_get_node_guid( p_node ) ) );
> >> +         status = IB_ERROR;
> >> +         goto Exit;
> >> +      }
> >> +   }
> >> +
> >> +   /*
> >> +    * Now go through the path step by step
> >> +    */
> >> +
> >> +   while( p_physp != p_dest_physp )
> >> +   {
> >> +      p_physp = osm_physp_get_remote( p_physp );
> >> +      if ( p_physp == 0 )
> >> +      {
> >> +         osm_log( p_rcv->p_log, OSM_LOG_ERROR,
> >> +                  "__osm_pr_rcv_get_path_parms_qos: ERR 1F04: "
> >> +                  "Cannot find remote phys port when routing to LID 0x%X from node GUID 0x%016" PRIx64 "\n",
> >> +                  dest_lid_ho,
> >> +                  cl_ntoh64( osm_node_get_node_guid( p_node ) ) );
> >> +         status = IB_ERROR;
> >> +         goto Exit;
> >> +      }
> >> +      
> >> +      in_port_num = osm_physp_get_port_num(p_physp);
> >> +
> >> +      /* this is point to point case (no switch in between) */
> >> +      if( p_physp == p_dest_physp )
> >> +         break;
> > 
> > 
> > Ordering of check for switch and point to point case are different here
> > and original routine. Should they be the same ? If so, which should
> > change ? (Any reason why this was moved in this routine ?)
> Not sure I'm following.
> The order of check for switch and point to point case looks the same
> to me (am I missing something?). The difference that I see is that 
> the mtu and rate in the original function are adjusted after the 
> check for switch, and in the new function they are adjusted before the
> check, which I think is the same.

That could have been what I was seeing. Shouldn't the two functions be
indentical in order (assuming these are to be separated) ? I wouldn't
want to see them diverge further.

[snip...]

> >> +/**********************************************************************
> >> + **********************************************************************/
> >>  static void
> >>  __osm_pr_rcv_build_pr(
> >>    IN osm_pr_rcv_t*         const p_rcv,
> >> @@ -774,7 +1569,8 @@ __osm_pr_rcv_build_pr(
> >>  #endif
> >>  
> >>    p_pr->pkey = p_parms->pkey;
> >> -  p_pr->sl = cl_hton16(p_parms->sl);
> >> +  ib_path_rec_set_qos_class(p_pr,p_parms->class);
> >> +  ib_path_rec_set_sl(p_pr,p_parms->sl);
> >>    p_pr->mtu = (uint8_t)(p_parms->mtu | 0x80);
> >>    p_pr->rate = (uint8_t)(p_parms->rate | 0x80);
> >>  
> >> @@ -832,10 +1628,14 @@ __osm_pr_rcv_get_lid_pair_path(
> >>      goto Exit;
> >>    }
> >>  
> >> -  status = __osm_pr_rcv_get_path_parms( p_rcv, p_pr, p_src_port,
> >> -                                        p_dest_port, dest_lid_ho,
> >> -                                        comp_mask, &path_parms );
> >> -
> >> +  if (p_rcv->p_subn->opt.no_qos)
> > 
> > Shouldn't this be based on p_rcv->p_subn.opt.qos_policy_file rather than
> > no_qos ? I think there are cases where the QoS will be used without the
> > QoS policy (higher level QoS support).
> 
> By totally ignoring sl2vl tables the original function may return
> path that isn't a "real" path - it may lead to VL15 at some point.
> So the new function takes care of this problem.

So it's a bug fix (missing functionality) in the existing QoS support.

> When there's no policy file, the policy parse tree is empty, and then 
> the ports would not have any qos-level to be applied on the examined path.
> In that case the new function does whatever the old one did, plus checking
> the path for sl2vl "consistency".

Got it. Thanks.

-- Hal

> -- Yevgeny






More information about the general mailing list