[openib-general] [PATCH] osm: bug that caused ucast manager to'hang'

Eitan Zahavi eitan at mellanox.co.il
Fri Dec 15 13:26:23 PST 2006


Hi Hal,

Every osm manager (step in the algorithm) shall return 
OSM_SIGNAL_DONE_PENDING iff there are outstanding packets on the wire.
Or it should return OSM_SIGNAL_DONE if there are none.
The state manager uses there values to determine if it needs to wait for
all these SMPs to finish or
can progress to the next step.

This is a quote from the osm_ucast_mgr.c:
  /*
    For now don't bother checking if the switch forwarding tables
    actually needed updating.  The current code will always update
    them, and thus leave transactions pending on the wire.
    Therefore, return OSM_SIGNAL_DONE_PENDING.
  */
  signal = OSM_SIGNAL_DONE_PENDING;

This assumption was broken by the change avoiding sending Set(LFT) if
they did not change.

So the osm_state_mgr was stuck at the stage 
OSM_SM_STATE_SET_UCAST_TABLES_WAIT 
And never get a OSM_SIGNAL_NO_PENDING_TRANSACTIONS to exit it (since
there are no outstanding SMPs).

EZ

> -----Original Message-----
> From: Hal Rosenstock [mailto:halr at voltaire.com]
> Sent: Friday, December 15, 2006 11:15 PM
> To: Eitan Zahavi
> Cc: OPENIB
> Subject: Re: [openib-general] [PATCH] osm: bug that caused ucast
manager
> to'hang'
> 
> On Fri, 2006-12-15 at 15:03, Eitan Zahavi wrote:
> > Hal Rosenstock wrote:
> > > On Fri, 2006-12-15 at 12:04, Eitan Zahavi wrote:
> > >
> > >> Hal Rosenstock wrote:
> > >>
> > >>> Hi again Yevgeny,
> > >>>
> > >>> On Thu, 2006-12-14 at 14:58, Yevgeny Kliteynik wrote:
> > >>>
> > >>>
> > >>>> Hi Hal
> > >>>>
> > >>>> This patch fixes a bug that caused ucast manager to return
> > >>>> OSM_SIGNAL_DONE_PENDING even if there are no pending
> transactions.
> > >>>> Added a boolean flag that marks whether there was some change
or
> > >>>> not (in which case OSM_SIGNAL_DONE should be returned).
> > >>>>
> > >>>> --
> > >>>> Yevgeny
> > >>>>
> > >>>> Signed-off-by: Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>
> > >>>>
> > >>>>
> > >>> Good catch!
> > >>>
> > >>> Thanks. Applied.
> > >>>
> > >>> Is this issue (and patch or a similar one) also applicable to
OFED 1.1 ?
> > >>>
> > >>>
> > >> I think OFED 1.1 does not have the "incremental" routing patch.
> > >>
> > >
> > > Right; it doesn't.
> > >
> > >
> > >> So it does not have this bug.
> > >>
> > >
> > > Are you sure that the incremental routing caused this to be needed
?
> > > By any chance, are you confusing this with a different patch ?
Just
> > > want to be clear on this...
> > >
> > Yes I am sure. Without the new incremental feature every sweep all
LFT
> > tables were set.
> 
> That sounds like a different bug to me. Yevgeny's patch was for a hang
which
> involved issuing OSM_SIGNAL_DONE_PENDING rather than
> OSM_SIGNAL_DONE. Is this related to incremental routing ?
> 
> -- Hal
> 
> > EZ
> > > -- Hal
> > >
> > >
> > >> EZ
> > >>
> > >>> -- Hal
> > >>>
> > >>>
> > >>> _______________________________________________
> > >>> openib-general mailing list
> > >>> openib-general at openib.org
> > >>> http://openib.org/mailman/listinfo/openib-general
> > >>>
> > >>> To unsubscribe, please visit
> > >>> http://openib.org/mailman/listinfo/openib-general
> > >>>
> > >>>
> > >
> > >
> > > _______________________________________________
> > > openib-general mailing list
> > > openib-general at openib.org
> > > http://openib.org/mailman/listinfo/openib-general
> > >
> > > To unsubscribe, please visit
> > > http://openib.org/mailman/listinfo/openib-general
> > >
> >
> 





More information about the general mailing list