[ofa-general] Re: [PATCH] osm: bug in dumping opensm.fdbs

Sasha Khapyorsky sashak at voltaire.com
Sat Jul 7 05:56:53 PDT 2007


On 00:28 Sat 07 Jul     , Yevgeny Kliteynik wrote:
>  Hi Sasha,
> 
>  Sasha Khapyorsky wrote:
> > Hi Yevgeny,
> > On 10:43 Thu 05 Jul     , Yevgeny Kliteynik wrote:
> >>  Hi Hal,
> >>
> >>  opensm.fdbs dump function adaptation to the recent changes in min hop 
> >> tables
> >>  broke fat-tree routing (or any other future routing that may not use the  
> >> same
> >>  min hop tables creation functions).
> > Could you please explain how this dump function break the routing for
> > fat-tree? Thanks.
> 
>  Example:
>    - We're dumping table for switch SW_A, and the target is CA.
>    - To get to CA from SW_A, there are at leas two options:
>        1. SW_A->...->SW_X->...->SW_B->CA
>        2. SW_A->...->SW_Y->...->SW_B->CA
>    - Fat-tree may chose to go through SW_X when routing from SW_A to CA,
>      and through SW_Y when routing from SW_A to SW_B, hence it might chose
>      different ports on SW_A

Ok, so your are refering incorrect dumping info, and not the routing
itself?

>  In the recent optimization for MinHop and Up/Dn, min hop tables creation is 
>  done
>  only for switches,

BTW is such optimization is suitable for fat-tree engine?

Sasha

> and in order to go from SW_A to CA the algorithm checks 
>  which
>  switch is connected to CA (SW_B in this case), and choses the port on SW_A 
>  that
>  routes to SW_B, hence routes to SW_B and CA have to be the same (except for 
>  the
>  last SW_B->CA hop).
> 
>  -- Yevgeny
> 
> > Sasha
> >>  -- Yevgeny
> >>
> >>  Signed-off-by: Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>
> >>  ---
> >>   opensm/opensm/osm_ucast_mgr.c |   33 ++++++++++++++++++++++++---------
> >>   1 files changed, 24 insertions(+), 9 deletions(-)
> >>
> >>  diff --git a/opensm/opensm/osm_ucast_mgr.c 
> >> b/opensm/opensm/osm_ucast_mgr.c
> >>  index 5bcb655..cab272e 100644
> >>  --- a/opensm/opensm/osm_ucast_mgr.c
> >>  +++ b/opensm/opensm/osm_ucast_mgr.c
> >>  @@ -242,6 +242,7 @@ __osm_ucast_mgr_dump_path_distribution(
> >>
> >>   /**********************************************************************
> >>    **********************************************************************/
> >>  +
> >>   static void
> >>   __osm_ucast_mgr_dump_ucast_routes(
> >>     IN cl_map_item_t *p_map_item,
> >>  @@ -255,6 +256,7 @@ __osm_ucast_mgr_dump_ucast_routes(
> >>     uint8_t                  best_port;
> >>     uint16_t                 max_lid_ho;
> >>     uint16_t                 lid_ho, base_lid;
> >>  +  boolean_t                direct_route_exists = FALSE;
> >>     osm_switch_t* p_sw = (osm_switch_t *)p_map_item;
> >>     osm_ucast_mgr_t* p_mgr = ((struct ucast_mgr_dump_context 
> >> *)cxt)->p_mgr;
> >>     FILE *file = ((struct ucast_mgr_dump_context *)cxt)->file;
> >>  @@ -300,22 +302,35 @@ __osm_ucast_mgr_dump_ucast_routes(
> >>       */
> >>       if( p_port->p_node->sw )
> >>       {
> >>  +      /* Target LID is switch.
> >>  +         Get its base lid and check hop count for this base LID only.*/
> >>         base_lid = osm_node_get_base_lid(p_port->p_node, 0);
> >>         base_lid = cl_ntoh16(base_lid);
> >>         num_hops = osm_switch_get_hop_count( p_sw, base_lid, port_num );
> >>       }
> >>       else
> >>       {
> >>  -      osm_physp_t *p_physp = p_port->p_physp;
> >>  -      if( !p_physp || !p_physp->p_remote_physp ||
> >>  -          !p_physp->p_remote_physp->p_node->sw )
> >>  -        num_hops = OSM_NO_PATH;
> >>  +      /* Target LID is not switch (CA or router).
> >>  +         Check if we have route to this target from current switch.*/
> >>  +      num_hops = osm_switch_get_hop_count( p_sw, lid_ho, port_num );
> >>  +      if (num_hops != OSM_NO_PATH)
> >>  +      {
> >>  +          direct_route_exists = TRUE;
> >>  +          base_lid = lid_ho;
> >>  +      }
> >>         else
> >>         {
> >>  -        base_lid = 
> >> osm_node_get_base_lid(p_physp->p_remote_physp->p_node,  0);
> >>  -        base_lid = cl_ntoh16(base_lid);
> >>  -        num_hops = p_physp->p_remote_physp->p_node->sw == p_sw ?
> >>  -                   0 : osm_switch_get_hop_count( p_sw, base_lid, 
> >> port_num  );
> >>  +        osm_physp_t *p_physp = p_port->p_physp;
> >>  +        if( !p_physp || !p_physp->p_remote_physp ||
> >>  +            !p_physp->p_remote_physp->p_node->sw )
> >>  +          num_hops = OSM_NO_PATH;
> >>  +        else
> >>  +        {
> >>  +          base_lid = 
> >> osm_node_get_base_lid(p_physp->p_remote_physp->p_node,  0);
> >>  +          base_lid = cl_ntoh16(base_lid);
> >>  +          num_hops = p_physp->p_remote_physp->p_node->sw == p_sw ?
> >>  +                     0 : osm_switch_get_hop_count( p_sw, base_lid, 
> >> port_num  );
> >>  +        }
> >>         }
> >>       }
> >>
> >>  @@ -326,7 +341,7 @@ __osm_ucast_mgr_dump_ucast_routes(
> >>       }
> >>
> >>       best_hops = osm_switch_get_least_hops( p_sw, base_lid );
> >>  -    if (!p_port->p_node->sw)
> >>  +    if (!p_port->p_node->sw && !direct_route_exists)
> >>       {
> >>         best_hops++;
> >>         num_hops++;
> >>  --  1.5.1.4
> >>
> >>
> 



More information about the general mailing list