[openib-general] [PATCH] osm: bug that caused ucast manager to'hang'

Hal Rosenstock halr at voltaire.com
Fri Dec 15 13:31:57 PST 2006


Hi Eitan,

On Fri, 2006-12-15 at 16:26, Eitan Zahavi wrote:
> Hi Hal,
> 
> Every osm manager (step in the algorithm) shall return 
> OSM_SIGNAL_DONE_PENDING iff there are outstanding packets on the wire.
> Or it should return OSM_SIGNAL_DONE if there are none.
> The state manager uses there values to determine if it needs to wait for
> all these SMPs to finish or
> can progress to the next step.
> 
> This is a quote from the osm_ucast_mgr.c:
>   /*
>     For now don't bother checking if the switch forwarding tables
>     actually needed updating.  The current code will always update
>     them, and thus leave transactions pending on the wire.
>     Therefore, return OSM_SIGNAL_DONE_PENDING.
>   */
>   signal = OSM_SIGNAL_DONE_PENDING;
> 
> This assumption was broken by the change avoiding sending Set(LFT) if
> they did not change.
> 
> So the osm_state_mgr was stuck at the stage 
> OSM_SM_STATE_SET_UCAST_TABLES_WAIT 
> And never get a OSM_SIGNAL_NO_PENDING_TRANSACTIONS to exit it (since
> there are no outstanding SMPs).

Got it. Thanks.

-- Hal

> EZ
> 
> > -----Original Message-----
> > From: Hal Rosenstock [mailto:halr at voltaire.com]
> > Sent: Friday, December 15, 2006 11:15 PM
> > To: Eitan Zahavi
> > Cc: OPENIB
> > Subject: Re: [openib-general] [PATCH] osm: bug that caused ucast
> manager
> > to'hang'
> > 
> > On Fri, 2006-12-15 at 15:03, Eitan Zahavi wrote:
> > > Hal Rosenstock wrote:
> > > > On Fri, 2006-12-15 at 12:04, Eitan Zahavi wrote:
> > > >
> > > >> Hal Rosenstock wrote:
> > > >>
> > > >>> Hi again Yevgeny,
> > > >>>
> > > >>> On Thu, 2006-12-14 at 14:58, Yevgeny Kliteynik wrote:
> > > >>>
> > > >>>
> > > >>>> Hi Hal
> > > >>>>
> > > >>>> This patch fixes a bug that caused ucast manager to return
> > > >>>> OSM_SIGNAL_DONE_PENDING even if there are no pending
> > transactions.
> > > >>>> Added a boolean flag that marks whether there was some change
> or
> > > >>>> not (in which case OSM_SIGNAL_DONE should be returned).
> > > >>>>
> > > >>>> --
> > > >>>> Yevgeny
> > > >>>>
> > > >>>> Signed-off-by: Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>
> > > >>>>
> > > >>>>
> > > >>> Good catch!
> > > >>>
> > > >>> Thanks. Applied.
> > > >>>
> > > >>> Is this issue (and patch or a similar one) also applicable to
> OFED 1.1 ?
> > > >>>
> > > >>>
> > > >> I think OFED 1.1 does not have the "incremental" routing patch.
> > > >>
> > > >
> > > > Right; it doesn't.
> > > >
> > > >
> > > >> So it does not have this bug.
> > > >>
> > > >
> > > > Are you sure that the incremental routing caused this to be needed
> ?
> > > > By any chance, are you confusing this with a different patch ?
> Just
> > > > want to be clear on this...
> > > >
> > > Yes I am sure. Without the new incremental feature every sweep all
> LFT
> > > tables were set.
> > 
> > That sounds like a different bug to me. Yevgeny's patch was for a hang
> which
> > involved issuing OSM_SIGNAL_DONE_PENDING rather than
> > OSM_SIGNAL_DONE. Is this related to incremental routing ?
> > 
> > -- Hal
> > 
> > > EZ
> > > > -- Hal
> > > >
> > > >
> > > >> EZ
> > > >>
> > > >>> -- Hal
> > > >>>
> > > >>>
> > > >>> _______________________________________________
> > > >>> openib-general mailing list
> > > >>> openib-general at openib.org
> > > >>> http://openib.org/mailman/listinfo/openib-general
> > > >>>
> > > >>> To unsubscribe, please visit
> > > >>> http://openib.org/mailman/listinfo/openib-general
> > > >>>
> > > >>>
> > > >
> > > >
> > > > _______________________________________________
> > > > openib-general mailing list
> > > > openib-general at openib.org
> > > > http://openib.org/mailman/listinfo/openib-general
> > > >
> > > > To unsubscribe, please visit
> > > > http://openib.org/mailman/listinfo/openib-general
> > > >
> > >
> > 
> 





More information about the general mailing list