[openib-general] opensm trouble

Roland Fehrenbacher rf at q-leap.de
Fri Mar 18 15:51:08 PST 2005


Hi,

I am having problems getting opensm to run using the latest gen1
version from https://openib.org/svn/gen1/trunk. I compiled the gen1
modules succesfully against vanilla 2.6.11 (some minor fixes were
necessary, patches appended), and can load them ok on Mellanox
HCAs. Architecture is x86_64.

    # cat /proc/infiniband/core/ca1/info
    name:          InfiniHost0
    provider:      tavor
    node GUID:     0002:c902:0040:12a0
    ports:         2
    vendor ID:     0x2c9
    device ID:     0x5a44
    HW revision:   0xa1
    FW revision:   0x300020000
    
When starting opensm, I get 

    # opensm
    -------------------------------------------------
    OpenSM Rev:1.8.0
    Command Line Arguments:
     Log File: /tmp/osm.log
    -------------------------------------------------
    OpenSM Rev:1.8.0
    
    
    Choose a local port number with which to bind:
    
            1: GUID = 0x       0, lid = 0x03C9, state = INIT
            2: GUID = 0x       0, lid = 0x03CA, state = DOWN
    
    Enter choice (1-2): 1
    SM port is down.
    SM port is down.

Obviously, the GUID is not read by opensm, and subsequent connection
fails. Excerpt from /tmp/osm.log:

---------------------------------------------
Mar 19 00:36:25 [4000] -> osm_opensm_init: Forcing single threaded dispatcher.
Mar 19 00:36:25 [4000] -> osm_report_notice: Reporting Generic Notice type:3 num:66 from LID:0x0000 GUID:0x0000000000000000,0x0000000000000000
Mar 19 00:36:25 [4000] -> osm_report_notice: Reporting Generic Notice type:3 num:66 from LID:0x0000 GUID:0x0000000000000000,0x0000000020f2ffff
Mar 19 00:36:32 [4002] -> __osmv_txn_timeout_cb: ERR 6702: The transaction request (tid=0x2F4B13AB) timed out (after 4 retries). Invoking the error callback.
Mar 19 00:36:32 [4002] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT).
Mar 19 00:36:32 [4002] -> SMP dump:
                                base_ver................0x1
                                mgmt_class..............0x81
                                class_ver...............0x1
                                method..................0x1 (SubnGet)
                                status..................0x0
                                hop_ptr.................0x0
                                hop_count...............0x0
                                trans_id................0x0
                                attr_id.................0x11 (NodeInfo)
                                resv....................0x0
                                attr_mod................0x0
                                m_key...................0x0000000000000000
                                dr_slid.................0xFFFF
                                dr_dlid.................0xFFFF

                                Initial path: [0]
                                Return path:  [0]
                                Reserved:     [0][0][0][0][0][0][0]

                                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

Mar 19 00:36:32 [18007] -> __osm_state_mgr_is_sm_port_down: ERR 3308: SM port GUID unknown.
??? 92 98311:00:1111188992 [645F7472] -> SM port is down.
---------------------------------


What could be the reason for this? I see the same behaviour when using
IBGD 1.6.1, 1.7.0-rc32, and also using vanilla 2.6.10.

Cheers,

Roland

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: patch-ql
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20050319/8a2f8a92/attachment.ksh>


More information about the general mailing list