[openib-general] OpenSM and Wrong SM_Key

Hal Rosenstock halr at voltaire.com
Wed Nov 9 08:26:04 PST 2005


Hi Eitan,

On Wed, 2005-11-09 at 02:46, Eitan Zahavi wrote:
> Hi Hal,
> 
> I would like to bring this to MgtWG before we change anything.
> IMO the situation when this happens is really not "legal" since if the
> SM's are not coordinated at least in their SM_Key it will cause the two
> masters on the subnet. 

Correct. That's what the current compliance says.

> >From our experience it is always better to cause a fatal flow and exit
> the SM rather then report the event in some log - normally it will not
> be seen ...

To upper layer management too (not just in a log). It's more than just
reporting in a log; it's the exiting which relinquishes the subnet.

> I know this is a controversial issue. 

Feel free to bring this up at the MgtWG.

> BTW: Another feature I would like to bring up is the SM behavior when it
> recognizes duplicated GUID on the subnet. Currently it will just issue
> an error in the log file.
> I would propose to make it abort after sending a log event describing
> the DR paths to these two devices.
> 
> What do you say?

If aborting means exiting (terminating) the SM in this case, I think
that is not a good thing and should be avoided. In the case of a
duplicated GUID (which should not occur), a choice needs to be made as
to which one to honor. The other should be ignored. The two ways I can
envision this is: (1) duplication of GUID in multiple nodes (bad
manufacturing process), and (2) SM bug of some sort.

-- Hal

> EZ
> 
> Eitan Zahavi
> Design Technology Director
> Mellanox Technologies LTD
> Tel:+972-4-9097208
> Fax:+972-4-9593245
> P.O. Box 586 Yokneam 20692 ISRAEL
> 
> 
> > -----Original Message-----
> > From: Hal Rosenstock [mailto:halr at voltaire.com]
> > Sent: Tuesday, November 08, 2005 11:09 PM
> > To: openib-general at openib.org
> > Subject: [openib-general] OpenSM and Wrong SM_Key
> > 
> > Hi,
> > 
> > Currently, when OpenSM receives SMInfo with a different SM_Key, it
> exits
> > as follows:
> > 
> > 
> > void
> > __osm_sminfo_rcv_process_get_response(
> >   IN const osm_sminfo_rcv_t*  const p_rcv,
> >   IN const osm_madw_t*     const p_madw )
> > {
> > ...
> > 
> > 
> > 
> >   /*
> >      Check that the sm_key of the found SM is the same as ours,
> >      or is zero. If not - OpenSM cannot continue with configuration!.
> */
> >   if ( p_smi->sm_key != 0 &&
> >        p_smi->sm_key != p_rcv->p_subn->opt.sm_key )
> >   {
> >     osm_log( p_rcv->p_log, OSM_LOG_ERROR,
> >              "__osm_sminfo_rcv_process_get_response: ERR 2F18: "
> >              "Got SM with sm_key that doesn't match our "
> >              "local key. Exiting\n" );
> >     osm_log( p_rcv->p_log, OSM_LOG_SYS,
> >              "Found remote SM with non-matching sm_key. Exiting\n" );
> >     osm_exit_flag = TRUE;
> >     goto Exit;
> >   }
> > 
> > C14-61.2.1 states that:
> > A master SM which finds a higher priority master SM with the wrong
> > SM_Key should not relinquish the subnet.
> > 
> > Exiting OpenSM relinquishes the subnet.
> > 
> > So it appears to me that perhaps this behavior of exiting OpenSM
> should
> > be at least contingent on the SM state and relative priority of the
> > SMInfo received. Make sense ? If so, I will work on a patch for this.
> > 
> > -- Hal
> > 
> > 
> > _______________________________________________
> > openib-general mailing list
> > openib-general at openib.org
> > http://openib.org/mailman/listinfo/openib-general
> > 
> > To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-general




More information about the general mailing list