<br>The reason is:<br>Jan 01 01:46:17 321555 [58F3E280] -> osm_vendor_set_sm: ERR 5431: setting IS_SM capability mask failed; errno 2<br><br>From the code it looks like  /dev/infiniband/issm<umad_port> needs to be created and I did that. But still the SM with higher GUID seem to become the master whenever it does a sweep. The logs are too detailed. So I am sending snippets.
<br><br><span style="font-weight: bold;">Local port (with a high GUID)</span><br>Jan 01 02:49:56 332142 [5873E280] -> osm_pi_rcv_process: Discovered port num 0x1 with GUID = 0x2c901097682d1 for parent node GUID = 0x2c901097682d0, TID = 0x1236
<br>Jan 01 02:49:56 332197 [5873E280] -> PortInfo dump:<br>                                port number.............0x1<br>                                node_guid...............0x0002c901097682d0<br>                                port_guid...............0x0002c901097682d1
<br>                                m_key...................0x0000000000000000<br>                                subnet_prefix...........0xfe80000000000000<br>                                <span style="font-weight: bold;">
base_lid................0x1</span><br style="font-weight: bold;"><span style="font-weight: bold;">                                master_sm_base_lid......0x2</span><br>                                capability_mask.........0x2510A68
<br>                                diag_code...............0x0<br>                                m_key_lease_period......0x0<br>                                local_port_num..........0x1<br>                                link_width_enabled......0x3
<br>                                link_width_supported....0x3<br>                                link_width_active.......0x2<br>                                link_speed_supported....0x1<br>                                port_state..............ACTIVE
<br>                                state_info2.............0x52<br>                                m_key_protect_bits......0x0<br>                                lmc.....................0x0<br>                                link_speed..............0x11
<br>                                mtu_smsl................0x40<br>                                vl_cap_init_type........0x40<br>                                vl_high_limit...........0x0<br>                                vl_arb_high_cap.........0x8
<br>                                vl_arb_low_cap..........0x8<br>                                init_rep_mtu_cap........0x4<br>                                vl_stall_life...........0xFF<br>                                vl_enforce..............0x40
<br>                                m_key_violations........0x0<br>                                p_key_violations........0x0<br>                                q_key_violations........0x0<br>                                guid_cap................0x20
<br>                                client_reregister.......0x0<br>                                subnet_timeout..........0x12<br>                                resp_time_value.........0x10<br>                                error_threshold.........0x88
<br>Jan 01 02:49:56 332337 [5873E280] -> Capabilities Mask:<br>                                IB_PORT_CAP_HAS_TRAP<br>                                IB_PORT_CAP_HAS_AUTO_MIG<br>                                IB_PORT_CAP_HAS_SL_MAP
<br>                                IB_PORT_CAP_HAS_LED_INFO<br>                                IB_PORT_CAP_HAS_SYS_IMG_GUID<br>                                IB_PORT_CAP_HAS_COM_MGT<br>                                IB_PORT_CAP_HAS_VEND_CLS
<br>                                IB_PORT_CAP_HAS_CAP_NTC<br>                                IB_PORT_CAP_HAS_CLIENT_REREG<br><br>Remote Port which hosts the SM:<br>Jan 01 02:49:56 500638 [5AF3E280] -> osm_pi_rcv_process: Discovered port num 0x1 with GUID = 0x2c90109765da1 for parent node GUID = 0x2c90109765da0, TID = 0x123b
<br>Jan 01 02:49:56 500690 [5AF3E280] -> PortInfo dump:<br>Jan 01 02:49:56 500638 [5AF3E280] -> osm_pi_rcv_process: Discovered port num 0x1 with GUID = 0x2c90109765da1 for parent node GUID = 0x2c90109765da0, TID = 0x123b
<br>Jan 01 02:49:56 500690 [5AF3E280] -> PortInfo dump:<br>                                port number.............0x1<br>                                node_guid...............0x0002c90109765da0<br>                                port_guid...............0x0002c90109765da1
<br>                                m_key...................0x0000000000000000<br>                                subnet_prefix...........0xfe80000000000000<br>                                <span style="font-weight: bold;">
base_lid................0x2</span><br style="font-weight: bold;"><span style="font-weight: bold;">                                master_sm_base_lid......0x2</span><br>                                capability_mask.........0x2510A68
<br>                                diag_code...............0x0<br>                                m_key_lease_period......0x0<br>                                local_port_num..........0x1<br>                                link_width_enabled......0x3
<br>                                link_width_supported....0x3<br>                                link_width_active.......0x2<br>                                link_speed_supported....0x1<br>                                port_state..............ACTIVE
<br>                                state_info2.............0x52<br>                                m_key_protect_bits......0x0<br>                                lmc.....................0x0<br>                                link_speed..............0x11
<br>                                mtu_smsl................0x40<br>                                vl_cap_init_type........0x40<br>                                vl_high_limit...........0x0<br>                                vl_arb_high_cap.........0x8
<br>                                vl_arb_low_cap..........0x8<br>                                init_rep_mtu_cap........0x4<br>                                vl_stall_life...........0xFF<br>                                vl_enforce..............0x40
<br>                                m_key_violations........0x0<br>                                p_key_violations........0x0<br>                                q_key_violations........0x0<br>                                guid_cap................0x20
<br>                                client_reregister.......0x0<br>                                subnet_timeout..........0x12<br>                                resp_time_value.........0x10<br>                                error_threshold.........0x88
<br>Jan 01 02:49:56 500831 [5AF3E280] -> Capabilities Mask:<br>                                IB_PORT_CAP_HAS_TRAP<br>                                IB_PORT_CAP_HAS_AUTO_MIG<br>                                IB_PORT_CAP_HAS_SL_MAP
<br>                                IB_PORT_CAP_HAS_LED_INFO<br>                                IB_PORT_CAP_HAS_SYS_IMG_GUID<br>                                IB_PORT_CAP_HAS_COM_MGT<br>                                IB_PORT_CAP_HAS_VEND_CLS
<br>                                IB_PORT_CAP_HAS_CAP_NTC<br>                                IB_PORT_CAP_HAS_CLIENT_REREG<br><br>Please let me know if I look at some specific portion.<br><br>Thanks<br>Ganesh<br><br><br>
<br><div><span class="gmail_quote">On 16 May 2007 21:57:27 -0400, <b class="gmail_sendername">Hal Rosenstock</b> <<a href="mailto:halr@voltaire.com">halr@voltaire.com</a>> wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Hi again Ganesh,<br><br>On Wed, 2007-05-16 at 21:42, Ganesh Sadasivan wrote:<br>> Hi Hal,<br>><br>>  Please see inline.<br>><br>> On 16 May 2007 19:22:00 -0400, Hal Rosenstock <<a href="mailto:halr@voltaire.com">
halr@voltaire.com</a>><br>> wrote:<br>>         Hi Ganesh,<br>><br>>         On Wed, 2007-05-16 at 19:00, Ganesh Sadasivan wrote:<br>>         > Hi,<br>>         ><br>>         >    I have a setup with 2 HCAs connected back to back and am
<br>>         running<br>>         > opensm (ofed1.1, running at the same priority) on both of<br>>         them. Is<br>>         > there any utility to see who is the master?<br>><br>> Even with priority difeferences I am seeing the same 
behavior.Am I<br>> missing any option. I am setting "opensm -s 30" and "opensm -s 60" on<br>> the respective sides.<br><br>Why not use the default (10 secs) or at least the same on both sides ?<br>
<br>>         sminfo will show the SM state for a LID/GUID.<br>><br>><br>> Thanks.<br>><br>>         >   The smlid in ibv_devinfo, seems to be changing whenever an<br>>         SM does a<br>>         > sweep. Is this expected?
<br>><br>>         Nope. If they are both at the same priority, the lower GUID<br>>         should win<br>>         the SM election.<br>><br>>         Not sure what is going wrong in your (back to back HCA)
<br>>         subnet. Do you<br>>         ports stay active ?<br>><br>><br>> Yes both ports are active.<br><br>And they stay active (no LED color changes) ?<br><br>If not, can you run both OpenSMs in verbose mode (-V) and see if there
<br>is anything interesting/relevant in the logs ?<br><br>-- Hal<br><br>> Thanks<br>> Ganesh<br>><br>>         -- Hal<br>><br>>         > Thanks<br>>         > Ganesh<br>>         ><br>>         >
<br>>         ______________________________________________________________________<br>>         > _______________________________________________<br>>         > general mailing list<br>>         > <a href="mailto:general@lists.openfabrics.org">
general@lists.openfabrics.org</a><br>>         ><br>>         <a href="http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general">http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general</a><br>>         >
<br>>         > To unsubscribe, please visit<br>>         <a href="http://openib.org/mailman/listinfo/openib-general">http://openib.org/mailman/listinfo/openib-general</a><br>><br>><br><br></blockquote></div>
<br>