[ofa-general] Running multiple SM
Hal Rosenstock
halr at voltaire.com
Thu May 17 03:51:15 PDT 2007
On Thu, 2007-05-17 at 00:18, Ganesh Sadasivan wrote:
> The reason is:
> Jan 01 01:46:17 321555 [58F3E280] -> osm_vendor_set_sm: ERR 5431:
> setting IS_SM capability mask failed; errno 2
Yes, this makes sense now and explains what you are seeing.
> From the code it looks like /dev/infiniband/issm<umad_port> needs to
> be created and I did that.
This should be done via udev rather than manually. Do you have udev
setup ? If not, please follow the instructions on the wiki.
-- Hal
> But still the SM with higher GUID seem to become the master whenever
> it does a sweep. The logs are too detailed. So I am sending snippets.
>
> Local port (with a high GUID)
> Jan 01 02:49:56 332142 [5873E280] -> osm_pi_rcv_process: Discovered
> port num 0x1 with GUID = 0x2c901097682d1 for parent node GUID =
> 0x2c901097682d0, TID = 0x1236
> Jan 01 02:49:56 332197 [5873E280] -> PortInfo dump:
> port number.............0x1
>
> node_guid...............0x0002c901097682d0
>
> port_guid...............0x0002c901097682d1
>
> m_key...................0x0000000000000000
>
> subnet_prefix...........0xfe80000000000000
> base_lid................0x1
> master_sm_base_lid......0x2
> capability_mask.........0x2510A68
> diag_code...............0x0
> m_key_lease_period......0x0
> local_port_num..........0x1
> link_width_enabled......0x3
> link_width_supported....0x3
> link_width_active.......0x2
> link_speed_supported....0x1
> port_state..............ACTIVE
> state_info2.............0x52
> m_key_protect_bits......0x0
> lmc.....................0x0
> link_speed..............0x11
> mtu_smsl................0x40
> vl_cap_init_type........0x40
> vl_high_limit...........0x0
> vl_arb_high_cap.........0x8
> vl_arb_low_cap..........0x8
> init_rep_mtu_cap........0x4
> vl_stall_life...........0xFF
> vl_enforce..............0x40
> m_key_violations........0x0
> p_key_violations........0x0
> q_key_violations........0x0
> guid_cap................0x20
> client_reregister.......0x0
> subnet_timeout..........0x12
> resp_time_value.........0x10
> error_threshold.........0x88
> Jan 01 02:49:56 332337 [5873E280] -> Capabilities Mask:
> IB_PORT_CAP_HAS_TRAP
> IB_PORT_CAP_HAS_AUTO_MIG
> IB_PORT_CAP_HAS_SL_MAP
> IB_PORT_CAP_HAS_LED_INFO
> IB_PORT_CAP_HAS_SYS_IMG_GUID
> IB_PORT_CAP_HAS_COM_MGT
> IB_PORT_CAP_HAS_VEND_CLS
> IB_PORT_CAP_HAS_CAP_NTC
> IB_PORT_CAP_HAS_CLIENT_REREG
>
> Remote Port which hosts the SM:
> Jan 01 02:49:56 500638 [5AF3E280] -> osm_pi_rcv_process: Discovered
> port num 0x1 with GUID = 0x2c90109765da1 for parent node GUID =
> 0x2c90109765da0, TID = 0x123b
> Jan 01 02:49:56 500690 [5AF3E280] -> PortInfo dump:
> Jan 01 02:49:56 500638 [5AF3E280] -> osm_pi_rcv_process: Discovered
> port num 0x1 with GUID = 0x2c90109765da1 for parent node GUID =
> 0x2c90109765da0, TID = 0x123b
> Jan 01 02:49:56 500690 [5AF3E280] -> PortInfo dump:
> port number.............0x1
>
> node_guid...............0x0002c90109765da0
>
> port_guid...............0x0002c90109765da1
>
> m_key...................0x0000000000000000
>
> subnet_prefix...........0xfe80000000000000
> base_lid................0x2
> master_sm_base_lid......0x2
> capability_mask.........0x2510A68
> diag_code...............0x0
> m_key_lease_period......0x0
> local_port_num..........0x1
> link_width_enabled......0x3
> link_width_supported....0x3
> link_width_active.......0x2
> link_speed_supported....0x1
> port_state..............ACTIVE
> state_info2.............0x52
> m_key_protect_bits......0x0
> lmc.....................0x0
> link_speed..............0x11
> mtu_smsl................0x40
> vl_cap_init_type........0x40
> vl_high_limit...........0x0
> vl_arb_high_cap.........0x8
> vl_arb_low_cap..........0x8
> init_rep_mtu_cap........0x4
> vl_stall_life...........0xFF
> vl_enforce..............0x40
> m_key_violations........0x0
> p_key_violations........0x0
> q_key_violations........0x0
> guid_cap................0x20
> client_reregister.......0x0
> subnet_timeout..........0x12
> resp_time_value.........0x10
> error_threshold.........0x88
> Jan 01 02:49:56 500831 [5AF3E280] -> Capabilities Mask:
> IB_PORT_CAP_HAS_TRAP
> IB_PORT_CAP_HAS_AUTO_MIG
> IB_PORT_CAP_HAS_SL_MAP
> IB_PORT_CAP_HAS_LED_INFO
> IB_PORT_CAP_HAS_SYS_IMG_GUID
> IB_PORT_CAP_HAS_COM_MGT
> IB_PORT_CAP_HAS_VEND_CLS
> IB_PORT_CAP_HAS_CAP_NTC
> IB_PORT_CAP_HAS_CLIENT_REREG
>
> Please let me know if I look at some specific portion.
>
> Thanks
> Ganesh
>
>
>
> On 16 May 2007 21:57:27 -0400, Hal Rosenstock <halr at voltaire.com>
> wrote:
> Hi again Ganesh,
>
> On Wed, 2007-05-16 at 21:42, Ganesh Sadasivan wrote:
> > Hi Hal,
> >
> > Please see inline.
> >
> > On 16 May 2007 19:22:00 -0400, Hal Rosenstock
> <halr at voltaire.com>
> > wrote:
> > Hi Ganesh,
> >
> > On Wed, 2007-05-16 at 19:00, Ganesh Sadasivan wrote:
> > > Hi,
> > >
> > > I have a setup with 2 HCAs connected back to
> back and am
> > running
> > > opensm (ofed1.1, running at the same priority) on
> both of
> > them. Is
> > > there any utility to see who is the master?
> >
> > Even with priority difeferences I am seeing the same
> behavior.Am I
> > missing any option. I am setting "opensm -s 30" and "opensm
> -s 60" on
> > the respective sides.
>
> Why not use the default (10 secs) or at least the same on both
> sides ?
>
> > sminfo will show the SM state for a LID/GUID.
> >
> >
> > Thanks.
> >
> > > The smlid in ibv_devinfo, seems to be changing
> whenever an
> > SM does a
> > > sweep. Is this expected?
> >
> > Nope. If they are both at the same priority, the
> lower GUID
> > should win
> > the SM election.
> >
> > Not sure what is going wrong in your (back to back
> HCA)
> > subnet. Do you
> > ports stay active ?
> >
> >
> > Yes both ports are active.
>
> And they stay active (no LED color changes) ?
>
> If not, can you run both OpenSMs in verbose mode (-V) and see
> if there
> is anything interesting/relevant in the logs ?
>
> -- Hal
>
> > Thanks
> > Ganesh
> >
> > -- Hal
> >
> > > Thanks
> > > Ganesh
> > >
> > >
> >
> ______________________________________________________________________
> > > _______________________________________________
> > > general mailing list
> > > general at lists.openfabrics.org
> > >
> >
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> > >
> > > To unsubscribe, please visit
> > http://openib.org/mailman/listinfo/openib-general
> >
> >
>
>
More information about the general
mailing list