[openib-general] [PATCH] osm: Routing Tables are full of UNREACHABLE instead of real route

Sasha Khapyorsky sashak at voltaire.com
Sat Dec 9 09:46:07 PST 2006


On 16:13 Sat 09 Dec     , Eitan Zahavi wrote:
> Hi Sasha,
> 
> Your proposal for moving all "dump" files generation to end of sweep - 
> just before "SUBNET UP" is reported - makes perfect sense to me.
> 
> But it is a bit lower in priority to the rest of the stuff.
> Not sure if it worth tackling right now.

Ok, I may do this. This should not be big deal.

Sasha

> 
> Eitan
> 
> Eitan Zahavi
> Senior Engineering Director, Software Architect
> Mellanox Technologies LTD
> Tel:+972-4-9097208
> Fax:+972-4-9593245
> P.O. Box 586 Yokneam 20692 ISRAEL
> 
> > -----Original Message-----
> > From: Sasha Khapyorsky [mailto:sashak at voltaire.com]
> > Sent: Friday, December 08, 2006 11:55 PM
> > To: Eitan Zahavi
> > Cc: Hal Rosenstock; Yevgeny Kliteynik; OPENIB GENERAL
> > Subject: Re: [PATCH] osm: Routing Tables are full of UNREACHABLE
> instead of
> > real route
> > 
> > Hi Eitan,
> > 
> > On 17:12 Thu 07 Dec     , Eitan Zahavi wrote:
> > > Hi Hal,
> > >
> > > I resolved the mystery behind the osm.fdbs that is now full of
> > > UNREACHABLE instead of correct out ports.
> > >
> > > The problem is a consequence of the new code that does not use the
> > > switch LFT blocks for the intermediate LFT assignments:
> > > The idea of having incremental updates only relies on temporary
> buffer
> > > that the routing algorithm fills.
> > > Then it is sent to the wire only if there is a diff between the
> switch
> > > LFT tables (from the SMDB) and the temporary buffer.
> > >
> > > So the switch LFT tables are not being directly updated by the
> routing
> > > algorithm - but only by the GetResp obtained as reply to the
> setting.
> > > Until this stage of the description - everything looks right.
> > >
> > > But what is wrong is that the dump of LFT tables is invoked before
> the
> > > GetResp is obtained.
> > > So if only a single sweep is invoked the resulting osm.fdbs show the
> > > original state of the SMDB tables whicg is full of 0xFF =
> UNREACHABLE.
> > 
> > Right.
> > 
> > >
> > > The patch below is taking the easy way and should be probably
> revisited.
> > > Instead of having a separate algorithm step for dumping out the
> > > resulting GetResp data after all LFT responses were obtained it just
> > > copies the sent LFT blocks to the SMDB.
> > 
> > Would not this be better just to move all dumps at end of the OpenSM
> heavy
> > sweep. This should be simple, right?
> > 
> > Sasha
> > 
> > >
> > > I think we need to have at least this simple patch until we have the
> > > dump move to a new algorithm step.
> > >
> > > Thanks
> > > Eitan
> > >
> > > Signed-off-by:  Eitan Zahavi <eitan at mellanox.co.il>
> > >
> > ================================================================
> > =====
> > >
> > > diff --git a/osm/opensm/osm_ucast_mgr.c b/osm/opensm/osm_ucast_mgr.c
> > > index 5a55da8..3a62c7f 100644
> > > --- a/osm/opensm/osm_ucast_mgr.c
> > > +++ b/osm/opensm/osm_ucast_mgr.c
> > > @@ -982,7 +982,15 @@ osm_ucast_mgr_set_fwd_table(
> > >                "osm_ucast_mgr_set_fwd_table: ERR 3A05: "
> > >                "Sending linear fwd. tbl. block failed (%s)\n",
> > >                ib_get_err_str( status ) );
> > > -    }
> > > +    } else {
> > > +       /*
> > > +         HACK: for now we will assume we succeeded to send
> > > +         and set the local DB based on it. This should allow
> > > +         us to immediatly dump out our routing
> > > +       */
> > > +       osm_switch_set_ft_block(
> > > +          p_sw, p_mgr->lft_buf + block_id_ho * 64, block_id_ho);
> > > +        }
> > >   }
> > >
> > >   OSM_LOG_EXIT( p_mgr->p_log );
> > >




More information about the general mailing list