[openib-general] OpenSM and Wrong SM_Key
Hal Rosenstock
halr at voltaire.com
Wed Nov 9 08:26:04 PST 2005
Hi Eitan,
On Wed, 2005-11-09 at 02:46, Eitan Zahavi wrote:
> Hi Hal,
>
> I would like to bring this to MgtWG before we change anything.
> IMO the situation when this happens is really not "legal" since if the
> SM's are not coordinated at least in their SM_Key it will cause the two
> masters on the subnet.
Correct. That's what the current compliance says.
> >From our experience it is always better to cause a fatal flow and exit
> the SM rather then report the event in some log - normally it will not
> be seen ...
To upper layer management too (not just in a log). It's more than just
reporting in a log; it's the exiting which relinquishes the subnet.
> I know this is a controversial issue.
Feel free to bring this up at the MgtWG.
> BTW: Another feature I would like to bring up is the SM behavior when it
> recognizes duplicated GUID on the subnet. Currently it will just issue
> an error in the log file.
> I would propose to make it abort after sending a log event describing
> the DR paths to these two devices.
>
> What do you say?
If aborting means exiting (terminating) the SM in this case, I think
that is not a good thing and should be avoided. In the case of a
duplicated GUID (which should not occur), a choice needs to be made as
to which one to honor. The other should be ignored. The two ways I can
envision this is: (1) duplication of GUID in multiple nodes (bad
manufacturing process), and (2) SM bug of some sort.
-- Hal
> EZ
>
> Eitan Zahavi
> Design Technology Director
> Mellanox Technologies LTD
> Tel:+972-4-9097208
> Fax:+972-4-9593245
> P.O. Box 586 Yokneam 20692 ISRAEL
>
>
> > -----Original Message-----
> > From: Hal Rosenstock [mailto:halr at voltaire.com]
> > Sent: Tuesday, November 08, 2005 11:09 PM
> > To: openib-general at openib.org
> > Subject: [openib-general] OpenSM and Wrong SM_Key
> >
> > Hi,
> >
> > Currently, when OpenSM receives SMInfo with a different SM_Key, it
> exits
> > as follows:
> >
> >
> > void
> > __osm_sminfo_rcv_process_get_response(
> > IN const osm_sminfo_rcv_t* const p_rcv,
> > IN const osm_madw_t* const p_madw )
> > {
> > ...
> >
> >
> >
> > /*
> > Check that the sm_key of the found SM is the same as ours,
> > or is zero. If not - OpenSM cannot continue with configuration!.
> */
> > if ( p_smi->sm_key != 0 &&
> > p_smi->sm_key != p_rcv->p_subn->opt.sm_key )
> > {
> > osm_log( p_rcv->p_log, OSM_LOG_ERROR,
> > "__osm_sminfo_rcv_process_get_response: ERR 2F18: "
> > "Got SM with sm_key that doesn't match our "
> > "local key. Exiting\n" );
> > osm_log( p_rcv->p_log, OSM_LOG_SYS,
> > "Found remote SM with non-matching sm_key. Exiting\n" );
> > osm_exit_flag = TRUE;
> > goto Exit;
> > }
> >
> > C14-61.2.1 states that:
> > A master SM which finds a higher priority master SM with the wrong
> > SM_Key should not relinquish the subnet.
> >
> > Exiting OpenSM relinquishes the subnet.
> >
> > So it appears to me that perhaps this behavior of exiting OpenSM
> should
> > be at least contingent on the SM state and relative priority of the
> > SMInfo received. Make sense ? If so, I will work on a patch for this.
> >
> > -- Hal
> >
> >
> > _______________________________________________
> > openib-general mailing list
> > openib-general at openib.org
> > http://openib.org/mailman/listinfo/openib-general
> >
> > To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-general
More information about the general
mailing list