[openib-general] OpenSM and Wrong SM_Key

Eitan Zahavi eitan at mellanox.co.il
Tue Nov 8 23:46:06 PST 2005


Hi Hal,

I would like to bring this to MgtWG before we change anything.
IMO the situation when this happens is really not "legal" since if the
SM's are not coordinated at least in their SM_Key it will cause the two
masters on the subnet. 

>From our experience it is always better to cause a fatal flow and exit
the SM rather then report the event in some log - normally it will not
be seen ...

I know this is a controversial issue. 

BTW: Another feature I would like to bring up is the SM behavior when it
recognizes duplicated GUID on the subnet. Currently it will just issue
an error in the log file.
I would propose to make it abort after sending a log event describing
the DR paths to these two devices.

What do you say?

EZ

Eitan Zahavi
Design Technology Director
Mellanox Technologies LTD
Tel:+972-4-9097208
Fax:+972-4-9593245
P.O. Box 586 Yokneam 20692 ISRAEL


> -----Original Message-----
> From: Hal Rosenstock [mailto:halr at voltaire.com]
> Sent: Tuesday, November 08, 2005 11:09 PM
> To: openib-general at openib.org
> Subject: [openib-general] OpenSM and Wrong SM_Key
> 
> Hi,
> 
> Currently, when OpenSM receives SMInfo with a different SM_Key, it
exits
> as follows:
> 
> 
> void
> __osm_sminfo_rcv_process_get_response(
>   IN const osm_sminfo_rcv_t*  const p_rcv,
>   IN const osm_madw_t*     const p_madw )
> {
> ...
> 
> 
> 
>   /*
>      Check that the sm_key of the found SM is the same as ours,
>      or is zero. If not - OpenSM cannot continue with configuration!.
*/
>   if ( p_smi->sm_key != 0 &&
>        p_smi->sm_key != p_rcv->p_subn->opt.sm_key )
>   {
>     osm_log( p_rcv->p_log, OSM_LOG_ERROR,
>              "__osm_sminfo_rcv_process_get_response: ERR 2F18: "
>              "Got SM with sm_key that doesn't match our "
>              "local key. Exiting\n" );
>     osm_log( p_rcv->p_log, OSM_LOG_SYS,
>              "Found remote SM with non-matching sm_key. Exiting\n" );
>     osm_exit_flag = TRUE;
>     goto Exit;
>   }
> 
> C14-61.2.1 states that:
> A master SM which finds a higher priority master SM with the wrong
> SM_Key should not relinquish the subnet.
> 
> Exiting OpenSM relinquishes the subnet.
> 
> So it appears to me that perhaps this behavior of exiting OpenSM
should
> be at least contingent on the SM state and relative priority of the
> SMInfo received. Make sense ? If so, I will work on a patch for this.
> 
> -- Hal
> 
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general



More information about the general mailing list