[Fwd: [openib-general] OpenSM and Wrong SM_Key]

Yael Kalka yael at mellanox.co.il
Thu Dec 1 04:17:11 PST 2005


Hi Hal, Eitan,
I think the best option is to add an OpenSM option flag - exit_on_fatal.
This flag can decide on the action on fatal cases:
1. Exit or not when seeing SM with different SM_Key.
2. Exit or not when there is a fatal link error (e.g - multiple guids).
etc.

I tried to run 2 SMs just now with different SM_keys, and I see that
none of them
exit, since both receive SM_Key=0 on SMInfo GetResp.
The reason for that is that in the SMInfo Get request (as in all other
requests)
we do not send anything in the mad data. Meaning - all fields are clear.
In the __osm_sminfo_rcv_process_get_request function we are checking the
state according
to the payload data. This is always zero! Thus - SM will never know that
the SMInfo
request is sent from an SM that is master.

I will work on a fix for that.
Yael 

-----Original Message-----
From: Hal Rosenstock [mailto:halr at voltaire.com]
Sent: Wednesday, November 30, 2005 11:57 PM
To: Yael Kalka; Eitan Zahavi
Cc: openib-general at openib.org
Subject: [Fwd: [openib-general] OpenSM and Wrong SM_Key]


Hi Yael & Eitan,

Based on the recent MgtWG discussions, are you still holding your
position in terms of exiting OpenSM when a non matching SM Key is
discovered ? Just wondering if I can issue a patch for this and clear
this issue so OpenSM can be compliant for this aspect. Thanks.

-- Hal

-----Forwarded Message-----

From: Hal Rosenstock <halr at voltaire.com>
To: openib-general at openib.org
Subject: [openib-general] OpenSM and Wrong SM_Key
Date: 08 Nov 2005 16:08:47 -0500

Hi,

Currently, when OpenSM receives SMInfo with a different SM_Key, it exits
as follows:


void
__osm_sminfo_rcv_process_get_response(
  IN const osm_sminfo_rcv_t*  const p_rcv,
  IN const osm_madw_t*     const p_madw )
{
...



  /* 
     Check that the sm_key of the found SM is the same as ours,
     or is zero. If not - OpenSM cannot continue with configuration!. */
  if ( p_smi->sm_key != 0 &&
       p_smi->sm_key != p_rcv->p_subn->opt.sm_key )
  {
    osm_log( p_rcv->p_log, OSM_LOG_ERROR,
             "__osm_sminfo_rcv_process_get_response: ERR 2F18: "
             "Got SM with sm_key that doesn't match our "
             "local key. Exiting\n" );
    osm_log( p_rcv->p_log, OSM_LOG_SYS,
             "Found remote SM with non-matching sm_key. Exiting\n" );
    osm_exit_flag = TRUE;
    goto Exit;
  }

C14-61.2.1 states that:
A master SM which finds a higher priority master SM with the wrong
SM_Key should not relinquish the subnet.

Exiting OpenSM relinquishes the subnet.

So it appears to me that perhaps this behavior of exiting OpenSM should
be at least contingent on the SM state and relative priority of the
SMInfo received. Make sense ? If so, I will work on a patch for this.

-- Hal


_______________________________________________
openib-general mailing list
openib-general at openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general



More information about the general mailing list