[openib-general] Some More Operational Issues with OpenSM 1.1.0

Hal Rosenstock halr at voltaire.com
Tue Sep 13 07:16:56 PDT 2005


Hi,

Here are some additional operational issues with OpenSM 1.1.0:

1. The following warning now appears when OpenSM is started up:
opensm: /usr/local/lib/libopensm.so.1: no version information available (required by opensm)

2. Not sure what the LID manager doesn't like about the old settings
(from OpenSM 1.1.0).

Sep 13 09:34:59 330140 [B7F144A0] -> __osm_lid_mgr_validate_db: [
Sep 13 09:34:59 330260 [B7F144A0] -> __osm_lid_mgr_validate_db: ERR 0312: Ilegal LID range [0x4:0x0] for guid:0x0008f10403961355.
Sep 13 09:34:59 330289 [B7F144A0] -> osm_db_delete: [
Sep 13 09:34:59 330313 [B7F144A0] -> osm_db_delete: ]
Sep 13 09:34:59 330337 [B7F144A0] -> __osm_lid_mgr_validate_db: ERR 0312: Ilegal LID range [0x3:0x0] for guid:0x0008f10403960559.
Sep 13 09:34:59 330360 [B7F144A0] -> osm_db_delete: [
Sep 13 09:34:59 330379 [B7F144A0] -> osm_db_delete: ]
Sep 13 09:34:59 330402 [B7F144A0] -> __osm_lid_mgr_validate_db: ERR 0312: Ilegal LID range [0x5:0x0] for guid:0x005442ba00003080.
Sep 13 09:34:59 330424 [B7F144A0] -> osm_db_delete: [
Sep 13 09:34:59 330443 [B7F144A0] -> osm_db_delete: ]
Sep 13 09:34:59 330466 [B7F144A0] -> __osm_lid_mgr_validate_db: ERR 0312: Ilegal LID range [0x7:0x0] for guid:0x0008f1040396055a.
Sep 13 09:34:59 330535 [B7F144A0] -> osm_db_delete: [
Sep 13 09:34:59 330556 [B7F144A0] -> osm_db_delete: ]


3. LinearFDBTop is being detected as corrupted. This is bad.
Sep 13 09:34:59 732496 [B7713C40] -> osm_si_rcv_process: [
Sep 13 09:34:59 732514 [B7713C40] -> osm_si_rcv_process: Switch GUID = 0x0008f10400410015, TID = 0x1273.
Sep 13 09:34:59 732535 [B7713C40] -> osm_si_rcv_process: ERR 3610:
                                Bad LinearFDBTop value = 0xC000 on switch 0x8f10400410015.
                                Forcing correction to 0x0.

4. SM Set PortInfo being rejected with status 7. Not sure why that would
be. Also, in this case (and probably others which are similar), OpenSM
continues as if things succeeded. Is that right ?

Sep 13 09:35:00 326832 [B6F13BC0] -> SMP dump:
                                base_ver................0x1
                                mgmt_class..............0x81
                                class_ver...............0x1
                                method..................0x2 (SubnSet)
                                D bit...................0x0
                                status..................0x0
                                hop_ptr.................0x0
                                hop_count...............0x1
                                trans_id................0x12c9
                                attr_id.................0x15 (PortInfo)
                                resv....................0x0
                                attr_mod................0xA
                                m_key...................0x0000000000000000
                                dr_slid.................0xFFFF
                                dr_dlid.................0xFFFF

                                Initial path: [0][1]
                                Return path:  [0][0]
                                Reserved:     [0][0][0][0][0][0][0]

                                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                                00 00 00 00 00 00 00 00   00 00 00 00 0C 03 03 02

                                14 02 00 11 40 40 00 08   08 04 F2 40 00 00 00 00

                                00 00 00 00 00 88 00 00   00 00 00 00 00 00 00 00

Sep 13 09:35:00 326970 [B6F13BC0] -> osm_vendor_send: [
Sep 13 09:35:00 327426 [B6F13BC0] -> osm_vendor_send: Completed Sending Request p_madw = 0x80a44a8.
Sep 13 09:35:00 327453 [B6F13BC0] -> osm_vendor_send: ]
Sep 13 09:35:00 327473 [B6F13BC0] -> __osm_vl15_poller: 1 on wire, 6 outstanding, 0 unicasts sent, 150 sent total.
Sep 13 09:35:00 327634 [B5F13AC0] -> osm_mad_pool_get: [
Sep 13 09:35:00 327755 [B5F13AC0] -> osm_vendor_get: [
Sep 13 09:35:00 327775 [B5F13AC0] -> osm_vendor_get: Acquiring UMAD for p_madw = 0x80a46c4, size = 256.
Sep 13 09:35:00 327893 [B5F13AC0] -> osm_vendor_get: Acquired UMAD 0x80dbb18, size = 256.
Sep 13 09:35:00 327914 [B5F13AC0] -> osm_vendor_get: ]
Sep 13 09:35:00 327933 [B5F13AC0] -> osm_mad_pool_get: Acquired p_madw = 0x80a46b8, p_mad = 0x80dbb50, size = 256.
Sep 13 09:35:00 328050 [B5F13AC0] -> osm_mad_pool_get: ]
Sep 13 09:35:00 328070 [B5F13AC0] -> __osm_sm_mad_ctrl_rcv_callback: [
Sep 13 09:35:00 328183 [B5F13AC0] -> __osm_sm_mad_ctrl_rcv_callback: 150 QP0 MADs received.
Sep 13 09:35:00 328362 [B5F13AC0] -> SMP dump:
                                base_ver................0x1
                                mgmt_class..............0x81
                                class_ver...............0x1
                                method..................0x81 (SubnGetResp)
                                D bit...................0x1
                                status..................0x1C00
                                hop_ptr.................0x0
                                hop_count...............0x1
                                trans_id................0x12c9
                                attr_id.................0x15 (PortInfo)
                                resv....................0x0
                                attr_mod................0xA
                                m_key...................0x0000000000000000
                                dr_slid.................0xFFFF
                                dr_dlid.................0xFFFF

                                Initial path: [0][1]
                                Return path:  [0][C]
                                Reserved:     [0][0][0][0][0][0][0]

                                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                                00 00 00 00 00 00 00 00   00 00 00 00 0C 03 03 02

                                14 52 00 11 40 40 00 08   08 04 F2 40 00 00 00 00

                                00 00 00 00 00 88 00 00   00 00 00 00 00 00 00 00

Sep 13 09:35:00 328481 [B5F13AC0] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00.
Sep 13 09:35:00 328655 [B5F13AC0] -> SMP dump:
                                base_ver................0x1
                                mgmt_class..............0x81
                                class_ver...............0x1
                                method..................0x81 (SubnGetResp)
                                D bit...................0x1
                                status..................0x1C00
                                hop_ptr.................0x0
                                hop_count...............0x1
                                trans_id................0x12c9
                                attr_id.................0x15 (PortInfo)
                                resv....................0x0
                                attr_mod................0xA
                                m_key...................0x0000000000000000
                                dr_slid.................0xFFFF
                                dr_dlid.................0xFFFF

                                Initial path: [0][1]
                                Return path:  [0][C]
                                Reserved:     [0][0][0][0][0][0][0]

                                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                                00 00 00 00 00 00 00 00   00 00 00 00 0C 03 03 02

                                14 52 00 11 40 40 00 08   08 04 F2 40 00 00 00 00

                                00 00 00 00 00 88 00 00   00 00 00 00 00 00 00 00



Sep 13 09:35:00 336766 [B7713C40] -> osm_pi_rcv_process: [
Sep 13 09:35:00 336786 [B7713C40] -> PortInfo dump:
                                port number.............0xA
                                node_guid...............0x005442ba00003080
                                port_guid...............0x005442ba00003080
                                m_key...................0x0000000000000000
                                subnet_prefix...........0x0000000000000000
                                base_lid................0x0
                                master_sm_base_lid......0x0
                                capability_mask.........0x0
                                diag_code...............0x0
                                m_key_lease_period......0x0
                                local_port_num..........0xC
                                link_width_enabled......0x3
                                link_width_supported....0x3
                                link_width_active.......0x2
                                link_speed_supported....0x1
                                port_state..............ACTIVE
                                state_info2.............0x52
                                m_key_protect_bits......0x0
                                lmc.....................0x0
                                link_speed..............0x11
                                mtu_smsl................0x40
                                vl_cap..................0x40
                                vl_high_limit...........0x0
                                vl_arb_high_cap.........0x8
                                vl_arb_low_cap..........0x8
                                mtu_cap.................0x4
                                vl_stall_life...........0xF2
                                vl_enforce..............0x40
                                m_key_violations........0x0
                                p_key_violations........0x0
                                q_key_violations........0x0
                                guid_cap................0x0
                                subnet_timeout..........0x0
                                resp_time_value.........0x0
                                error_threshold.........0x88
Sep 13 09:35:00 336954 [B7713C40] -> Capabilities Mask:
Sep 13 09:35:00 336999 [B7713C40] -> osm_pi_rcv_process_set: [
Sep 13 09:35:00 337018 [B7713C40] -> osm_pi_rcv_process_set: ERR 0F10: Received Error Status for SetResp()
Sep 13 09:35:00 337133 [B7713C40] -> PortInfo dump:
                                port number.............0xA
                                node_guid...............0x005442ba00003080
                                port_guid...............0x005442ba00003080
                                m_key...................0x0000000000000000
                                subnet_prefix...........0x0000000000000000
                                base_lid................0x0
                                master_sm_base_lid......0x0
                                capability_mask.........0x0
                                diag_code...............0x0
                                m_key_lease_period......0x0
                                local_port_num..........0xC
                                link_width_enabled......0x3
                                link_width_supported....0x3
                                link_width_active.......0x2
                                link_speed_supported....0x1
                                port_state..............ACTIVE
                                state_info2.............0x52
                                m_key_protect_bits......0x0
                                lmc.....................0x0
                                link_speed..............0x11
                                mtu_smsl................0x40
                                vl_cap..................0x40
                                vl_high_limit...........0x0
                                vl_arb_high_cap.........0x8
                                vl_arb_low_cap..........0x8
                                mtu_cap.................0x4
                                vl_stall_life...........0xF2
                                vl_enforce..............0x40
                                m_key_violations........0x0
                                p_key_violations........0x0
                                q_key_violations........0x0
                                guid_cap................0x0
                                subnet_timeout..........0x0
                                resp_time_value.........0x0
                                error_threshold.........0x88
Sep 13 09:35:00 337176 [B7713C40] -> Capabilities Mask:
Sep 13 09:35:00 337216 [B7713C40] -> osm_pi_rcv_process_set: Received logical SetResp() for GUID = 0x5442ba00003080, port num = 10
                                for parent node GUID = 0x5442ba00003080 TID = 0x12c9.
Sep 13 09:35:00 337238 [B7713C40] -> osm_pi_rcv_process_set: ]
Sep 13 09:35:00 337257 [B7713C40] -> osm_pi_rcv_process: ]

Similarly for some other ports (0xC)

Thanks.

-- Hal




More information about the general mailing list