[ofa-general] Running multiple SM

Hal Rosenstock halr at voltaire.com
Thu May 17 03:51:15 PDT 2007


On Thu, 2007-05-17 at 00:18, Ganesh Sadasivan wrote:
> The reason is:
> Jan 01 01:46:17 321555 [58F3E280] -> osm_vendor_set_sm: ERR 5431:
> setting IS_SM capability mask failed; errno 2

Yes, this makes sense now and explains what you are seeing.

> From the code it looks like  /dev/infiniband/issm<umad_port> needs to
> be created and I did that.

This should be done via udev rather than manually. Do you have udev
setup ? If not, please follow the instructions on the wiki.

-- Hal

>  But still the SM with higher GUID seem to become the master whenever
> it does a sweep. The logs are too detailed. So I am sending snippets. 
> 
> Local port (with a high GUID)
> Jan 01 02:49:56 332142 [5873E280] -> osm_pi_rcv_process: Discovered
> port num 0x1 with GUID = 0x2c901097682d1 for parent node GUID =
> 0x2c901097682d0, TID = 0x1236 
> Jan 01 02:49:56 332197 [5873E280] -> PortInfo dump:
>                                 port number.............0x1
>                                
> node_guid...............0x0002c901097682d0
>                                
> port_guid...............0x0002c901097682d1 
>                                
> m_key...................0x0000000000000000
>                                
> subnet_prefix...........0xfe80000000000000
>                                 base_lid................0x1
>                                 master_sm_base_lid......0x2
>                                 capability_mask.........0x2510A68 
>                                 diag_code...............0x0
>                                 m_key_lease_period......0x0
>                                 local_port_num..........0x1
>                                 link_width_enabled......0x3 
>                                 link_width_supported....0x3
>                                 link_width_active.......0x2
>                                 link_speed_supported....0x1
>                                 port_state..............ACTIVE 
>                                 state_info2.............0x52
>                                 m_key_protect_bits......0x0
>                                 lmc.....................0x0
>                                 link_speed..............0x11 
>                                 mtu_smsl................0x40
>                                 vl_cap_init_type........0x40
>                                 vl_high_limit...........0x0
>                                 vl_arb_high_cap.........0x8 
>                                 vl_arb_low_cap..........0x8
>                                 init_rep_mtu_cap........0x4
>                                 vl_stall_life...........0xFF
>                                 vl_enforce..............0x40 
>                                 m_key_violations........0x0
>                                 p_key_violations........0x0
>                                 q_key_violations........0x0
>                                 guid_cap................0x20 
>                                 client_reregister.......0x0
>                                 subnet_timeout..........0x12
>                                 resp_time_value.........0x10
>                                 error_threshold.........0x88 
> Jan 01 02:49:56 332337 [5873E280] -> Capabilities Mask:
>                                 IB_PORT_CAP_HAS_TRAP
>                                 IB_PORT_CAP_HAS_AUTO_MIG
>                                 IB_PORT_CAP_HAS_SL_MAP 
>                                 IB_PORT_CAP_HAS_LED_INFO
>                                 IB_PORT_CAP_HAS_SYS_IMG_GUID
>                                 IB_PORT_CAP_HAS_COM_MGT
>                                 IB_PORT_CAP_HAS_VEND_CLS 
>                                 IB_PORT_CAP_HAS_CAP_NTC
>                                 IB_PORT_CAP_HAS_CLIENT_REREG
> 
> Remote Port which hosts the SM:
> Jan 01 02:49:56 500638 [5AF3E280] -> osm_pi_rcv_process: Discovered
> port num 0x1 with GUID = 0x2c90109765da1 for parent node GUID =
> 0x2c90109765da0, TID = 0x123b 
> Jan 01 02:49:56 500690 [5AF3E280] -> PortInfo dump:
> Jan 01 02:49:56 500638 [5AF3E280] -> osm_pi_rcv_process: Discovered
> port num 0x1 with GUID = 0x2c90109765da1 for parent node GUID =
> 0x2c90109765da0, TID = 0x123b 
> Jan 01 02:49:56 500690 [5AF3E280] -> PortInfo dump:
>                                 port number.............0x1
>                                
> node_guid...............0x0002c90109765da0
>                                
> port_guid...............0x0002c90109765da1 
>                                
> m_key...................0x0000000000000000
>                                
> subnet_prefix...........0xfe80000000000000
>                                 base_lid................0x2
>                                 master_sm_base_lid......0x2
>                                 capability_mask.........0x2510A68 
>                                 diag_code...............0x0
>                                 m_key_lease_period......0x0
>                                 local_port_num..........0x1
>                                 link_width_enabled......0x3 
>                                 link_width_supported....0x3
>                                 link_width_active.......0x2
>                                 link_speed_supported....0x1
>                                 port_state..............ACTIVE 
>                                 state_info2.............0x52
>                                 m_key_protect_bits......0x0
>                                 lmc.....................0x0
>                                 link_speed..............0x11 
>                                 mtu_smsl................0x40
>                                 vl_cap_init_type........0x40
>                                 vl_high_limit...........0x0
>                                 vl_arb_high_cap.........0x8 
>                                 vl_arb_low_cap..........0x8
>                                 init_rep_mtu_cap........0x4
>                                 vl_stall_life...........0xFF
>                                 vl_enforce..............0x40 
>                                 m_key_violations........0x0
>                                 p_key_violations........0x0
>                                 q_key_violations........0x0
>                                 guid_cap................0x20 
>                                 client_reregister.......0x0
>                                 subnet_timeout..........0x12
>                                 resp_time_value.........0x10
>                                 error_threshold.........0x88 
> Jan 01 02:49:56 500831 [5AF3E280] -> Capabilities Mask:
>                                 IB_PORT_CAP_HAS_TRAP
>                                 IB_PORT_CAP_HAS_AUTO_MIG
>                                 IB_PORT_CAP_HAS_SL_MAP 
>                                 IB_PORT_CAP_HAS_LED_INFO
>                                 IB_PORT_CAP_HAS_SYS_IMG_GUID
>                                 IB_PORT_CAP_HAS_COM_MGT
>                                 IB_PORT_CAP_HAS_VEND_CLS 
>                                 IB_PORT_CAP_HAS_CAP_NTC
>                                 IB_PORT_CAP_HAS_CLIENT_REREG
> 
> Please let me know if I look at some specific portion.
> 
> Thanks
> Ganesh
> 
> 
> 
> On 16 May 2007 21:57:27 -0400, Hal Rosenstock <halr at voltaire.com>
> wrote:
>         Hi again Ganesh,
>         
>         On Wed, 2007-05-16 at 21:42, Ganesh Sadasivan wrote:
>         > Hi Hal,
>         >
>         >  Please see inline.
>         >
>         > On 16 May 2007 19:22:00 -0400, Hal Rosenstock
>         <halr at voltaire.com>
>         > wrote:
>         >         Hi Ganesh,
>         >
>         >         On Wed, 2007-05-16 at 19:00, Ganesh Sadasivan wrote:
>         >         > Hi,
>         >         >
>         >         >    I have a setup with 2 HCAs connected back to
>         back and am 
>         >         running
>         >         > opensm (ofed1.1, running at the same priority) on
>         both of
>         >         them. Is
>         >         > there any utility to see who is the master?
>         >
>         > Even with priority difeferences I am seeing the same
>         behavior.Am I
>         > missing any option. I am setting "opensm -s 30" and "opensm
>         -s 60" on
>         > the respective sides.
>         
>         Why not use the default (10 secs) or at least the same on both
>         sides ?
>         
>         >         sminfo will show the SM state for a LID/GUID.
>         >
>         >
>         > Thanks.
>         >
>         >         >   The smlid in ibv_devinfo, seems to be changing
>         whenever an
>         >         SM does a
>         >         > sweep. Is this expected? 
>         >
>         >         Nope. If they are both at the same priority, the
>         lower GUID
>         >         should win
>         >         the SM election.
>         >
>         >         Not sure what is going wrong in your (back to back
>         HCA) 
>         >         subnet. Do you
>         >         ports stay active ?
>         >
>         >
>         > Yes both ports are active.
>         
>         And they stay active (no LED color changes) ?
>         
>         If not, can you run both OpenSMs in verbose mode (-V) and see
>         if there 
>         is anything interesting/relevant in the logs ?
>         
>         -- Hal
>         
>         > Thanks
>         > Ganesh
>         >
>         >         -- Hal
>         >
>         >         > Thanks
>         >         > Ganesh
>         >         >
>         >         > 
>         >        
>         ______________________________________________________________________
>         >         > _______________________________________________
>         >         > general mailing list
>         >         > general at lists.openfabrics.org
>         >         >
>         >        
>         http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>         >         > 
>         >         > To unsubscribe, please visit
>         >         http://openib.org/mailman/listinfo/openib-general
>         >
>         >
>         
> 




More information about the general mailing list