[openib-general] OpenSM and Wrong SM_Key

Hal Rosenstock halr at voltaire.com
Thu Nov 17 13:44:27 PST 2005


On Sat, 2005-11-12 at 12:34, Eitan Zahavi wrote:
> Hi Troy,
> 
> Good to get a straight forward message.
> 
> What I hear you saying is:
> 1. There needs to be a parameter to control the SM behavior if it finds
> another SM with non matching SM Key: 
> -> Either to ignore it or to die. We can do that. No problem!
> 
> 2. The SM log file has too many errors.
> -> Are you aware of the messages the SM sends to syslog ? We try and
> make these the most important ones. If you feel some messages are
> missing there - just let us know and we will fix it.

I think that exiting is noncompliant and this should not be made into a
command line option.

-- Hal

> -> The osm.log is intended for OpenSM errors reporting. These include
> any error. We try our best to clean it up from un-needed events. But as
> I say it is NOT the log you should use for getting the SM major events.
> You should look at the /var/log/messages instead.
> 
> Eitan Zahavi
> Design Technology Director
> Mellanox Technologies LTD
> Tel:+972-4-9097208
> Fax:+972-4-9593245
> P.O. Box 586 Yokneam 20692 ISRAEL
> 
> 
> > -----Original Message-----
> > From: Troy Benjegerdes [mailto:hozer at hozed.org]
> > Sent: Friday, November 11, 2005 8:32 AM
> > To: Eitan Zahavi
> > Cc: Hal Rosenstock; openib-general at openib.org
> > Subject: Re: [openib-general] OpenSM and Wrong SM_Key
> > 
> > On Wed, Nov 09, 2005 at 09:46:06AM +0200, Eitan Zahavi wrote:
> > > Hi Hal,
> > >
> > > I would like to bring this to MgtWG before we change anything.
> > > IMO the situation when this happens is really not "legal" since if
> the
> > > SM's are not coordinated at least in their SM_Key it will cause the
> two
> > > masters on the subnet.
> > >
> > > >From our experience it is always better to cause a fatal flow and
> exit
> > > the SM rather then report the event in some log - normally it will
> not
> > > be seen ...
> > >
> > > I know this is a controversial issue.
> > 
> > Okay, so you're telling me you *WANT* behavior where a rogue node can
> > trivially cause the running subnet manager to exit and take over
> > management of the network?
> > 
> > Opensm needs to have a well documented config file, instead of 3 pages
> > of command line options, and different levels of logging. What to do
> in
> > the above situation is a site-local policy config decision, not
> something
> > that should be hard-coded in the SM source code.
> > 
> > The logs might actually get looked at if there wasn't junk in the log
> > every time something timed out.
> > 
> > The linux kernel has 'WARN, NOTICE, and CRITICAL' level log messages.




More information about the general mailing list