[ofa-general] opensm hang and osmtest report ERR 0130

Yevgeny Kliteynik kliteyn at dev.mellanox.co.il
Wed Aug 6 03:54:54 PDT 2008


Hi,

Wen Hao Wang wrote:
> Hi all:
> 
> My subnet includes RHEL5.2 servers and one switch with subnet manager 
> running. On RHEL 5.2 servers, I can run most IB diagnostics 
> commands/scripts, such as ibnetdiscover and ibstat without errors. But 
> opensm command hang with following output
> 
> [root at gaia-07 OFED-1.3.1]# opensm
> -------------------------------------------------
> OpenSM 3.1.11
> Command Line Arguments:
> Log File: /var/log/opensm.log
> -------------------------------------------------
> OpenSM 3.1.11
> 
> Using default GUID 0x2c90300013371
> Entering STANDBY state
> 
> (never end up ...)

This part is OK - opensm enters the stand-by state and
waits in this state indefinitely. This happened because
opensm detects other opensm in the subnet.
If you kill that other opensm, the stand-by opensm will
enter MASTER state after a short period.
You can see who's the master opensm in your subnet by
running 'sminfo' tool.

> [root at gaia-07 ~]# ps -ef|grep opensm
> root 30018 10888 0 10:50 pts/2 00:00:00 opensm
> root 30081 30035 0 10:52 pts/1 00:00:00 grep opensm
> [root at gaia-07 ~]# ps -ef|grep 30018[root at gaia-07 ~]# osmtest
> root 30018 10888 0 10:50 pts/2 00:00:00 opensm
> root 30100 30035 0 11:00 pts/1 00:00:00 grep 30018
> [root at gaia-07 ~]# tail /var/log/opensm.log
> Aug 06 10:50:12 631425 [B07D0EB0] 0x03 -> OpenSM 3.1.11
> Aug 06 10:50:12 631472 [B07D0EB0] 0x80 -> OpenSM 3.1.11
> Aug 06 10:50:12 640853 [B07D0EB0] 0x02 -> osm_vendor_bind: Binding to 
> port 0x2c90300013371
> Aug 06 10:50:12 662682 [B07D0EB0] 0x02 -> osm_vendor_bind: Binding to 
> port 0x2c90300013371
> Aug 06 10:50:12 667338 [486D6940] 0x80 -> Entering STANDBY state
> 
> 
> It seems opensm does not spawn other threads. While osmtest gave errors.

If there is another opensm in the subnet, osmtest
shouldn't fail. See below.

> [root at gaia-07 ~]# osmtest
> 
> Command Line Arguments
> Done with args
> Flow = All Validations
> Aug 06 11:02:25 264234 [EDCFC880] 0x7f -> Setting log level to: 0x03
> Aug 06 11:02:25 282259 [EDCFC880] 0x02 -> osm_vendor_bind: Binding to 
> port 0x2c90300013371
> Aug 06 11:02:25 304475 [EDCFC880] 0x02 -> 
> osmtest_validate_sa_class_port_info:
> -----------------------------
> SA Class Port Info:
> base_ver:1
> class_ver:2
> cap_mask:0x2601
> cap_mask2:0x0
> resp_time_val:0x14
> -----------------------------
> Aug 06 11:02:25 304526 [EDCFC880] 0x01 -> osmtest_create_db: ERR 0130: 
> Unable to open inventory file (osmtest.dat)
> Aug 06 11:02:25 304555 [EDCFC880] 0x01 -> osmtest_run: ERR 0145: 
> Database creation failed (IB_ERROR)
> OSMTEST: TEST "All Validations" FAIL

By default, osmtest runs all validation tests, which is similar
to 'osmtest -f a'. This flow expects to get an input inventory file.
You should first run 'osmtest -f c' to create such file, and then
'osmtest' or 'osmtest -f a' to run the tests.
See 'man osmtest' for more details.

-- Yevgeny

> Is there any advice how to probe the opensm/osmtest issue? Thanks in 
> advance!
> 
> Wen Hao Wang
> Email: wangwhao at cn.ibm.com
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general




More information about the general mailing list