[openib-general] OpenSM and Wrong SM_Key

Eitan Zahavi eitan at mellanox.co.il
Sat Nov 12 09:34:44 PST 2005


Hi Troy,

Good to get a straight forward message.

What I hear you saying is:
1. There needs to be a parameter to control the SM behavior if it finds
another SM with non matching SM Key: 
-> Either to ignore it or to die. We can do that. No problem!

2. The SM log file has too many errors.
-> Are you aware of the messages the SM sends to syslog ? We try and
make these the most important ones. If you feel some messages are
missing there - just let us know and we will fix it.
-> The osm.log is intended for OpenSM errors reporting. These include
any error. We try our best to clean it up from un-needed events. But as
I say it is NOT the log you should use for getting the SM major events.
You should look at the /var/log/messages instead.

Eitan Zahavi
Design Technology Director
Mellanox Technologies LTD
Tel:+972-4-9097208
Fax:+972-4-9593245
P.O. Box 586 Yokneam 20692 ISRAEL


> -----Original Message-----
> From: Troy Benjegerdes [mailto:hozer at hozed.org]
> Sent: Friday, November 11, 2005 8:32 AM
> To: Eitan Zahavi
> Cc: Hal Rosenstock; openib-general at openib.org
> Subject: Re: [openib-general] OpenSM and Wrong SM_Key
> 
> On Wed, Nov 09, 2005 at 09:46:06AM +0200, Eitan Zahavi wrote:
> > Hi Hal,
> >
> > I would like to bring this to MgtWG before we change anything.
> > IMO the situation when this happens is really not "legal" since if
the
> > SM's are not coordinated at least in their SM_Key it will cause the
two
> > masters on the subnet.
> >
> > >From our experience it is always better to cause a fatal flow and
exit
> > the SM rather then report the event in some log - normally it will
not
> > be seen ...
> >
> > I know this is a controversial issue.
> 
> Okay, so you're telling me you *WANT* behavior where a rogue node can
> trivially cause the running subnet manager to exit and take over
> management of the network?
> 
> Opensm needs to have a well documented config file, instead of 3 pages
> of command line options, and different levels of logging. What to do
in
> the above situation is a site-local policy config decision, not
something
> that should be hard-coded in the SM source code.
> 
> The logs might actually get looked at if there wasn't junk in the log
> every time something timed out.
> 
> The linux kernel has 'WARN, NOTICE, and CRITICAL' level log messages.



More information about the general mailing list