[ofa-general] ***SPAM*** opensm failure

Yevgeny Kliteynik kliteyn at dev.mellanox.co.il
Wed Sep 10 01:25:12 PDT 2008


Hi Todd,

Todd Bowman wrote:
> OpenSM Rev:openib-3.0.13

Can you upgrade to OFED 1.3.1?
We had some bug that was causing opensm to drop the wrong transactions,
and the errors in your log could be caused by that. The bug was fixed
in OFED 1.3

-- Yevgeny

> The opensm segfaulted during an initialization that seems to have been 
> the result of a link state trap (type 1 num12)
> 
> 
> 09:49:51 914967 [41001960] -> __osm_trap_rcv_process_
> request: Received Generic Notice type:0x01 num:128 Producer:2 from 
> LID:0x011A TID:0x00000000000016cc
> 09:49:51 948014 [41001960] -> osm_report_notice: Reporting Generic 
> Notice type:1 num:128 from LID:0x011A 
> GID:0xfe80000000000000,0x0008f104003f0ab5
> 09:49:51 948477 [41802960] -> osm_report_notice: Reporting Generic 
> Notice type:3 num:67 from LID:0x00FD 
> GID:0xfe80000000000000,0x0002c902002064ad
> 09:49:51 948497 [41802960] -> osm_report_notice: Reporting Generic 
> Notice type:3 num:65 from LID:0x00FD 
> GID:0xfe80000000000000,0x0002c902002064ad
> 09:49:51 948502 [41802960] -> __osm_drop_mgr_remove_port: Removed port 
> with GUID:0x0002c90200207801 LID range [0x89,0x89] of node:n1008
> 09:49:51 948519 [41802960] -> osm_report_notice: Reporting Generic 
> Notice type:3 num:67 from LID:0x00FD 
> GID:0xfe80000000000000,0x0002c902002064ad
> 09:49:51 948529 [41802960] -> osm_report_notice: Reporting Generic 
> Notice type:3 num:65 from LID:0x00FD 
> GID:0xfe80000000000000,0x0002c902002064ad
> ...
> ...
> ...
> 
> 09:49:51 962126 [41802960] -> __osm_drop_mgr_remove_port: Removed port 
> with GUID:0x0002c902002064ad LID range [0xFD,0xFD] of node:hn HCA-1
> 09:49:52 044097 [41802960] -> __osm_lid_mgr_process_our_sm_node: ERR 
> 0308: Can't acquire SM's port object, GUID 0x0002c902002064ad
> 09:49:52 098558 [41001960] -> __osm_state_mgr_signal_error: ERR 3303: 
> Invalid signal OSM_SIGNAL_CHANGE_DETECTED(2) in state 
> OSM_SM_STATE_SET_SUBNET_UCAST_LIDS_WAIT
> 09:49:52 098917 [41001960] -> __osm_state_mgr_check_tbl_consistency: ERR 
> 3322: lid 0x6E is wrongly assigned to port 0x0008f104003f2cdb in 
> port_lid_tbl
> 09:49:52 098936 [41001960] -> osm_report_notice: Reporting Generic 
> Notice type:3 num:64 from LID:0x00FD 
> GID:0xfe80000000000000,0x0002c902002064ad
> 09:49:52 098944 [41001960] -> __osm_state_mgr_report_new_ports: 
> Discovered new port with GUID:0x0008f104003f2cdb LID range [0x0,0x0] of 
> node:ISR9288/ISR9096 Voltaire sLB-24
> 09:49:52 098957 [41001960] -> osm_ucast_mgr_process: null (min-hop) 
> tables configured on all switches
> 09:49:52 098992 [41001960] -> __osm_ucast_mgr_process_port: ERR 3A04: 
> Port 0x8f104003f2cdb has LID 0. An initialization error occurred. 
> Ignoring port
> 09:49:52 103405 [41802960] -> __osm_state_mgr_signal_error: ERR 3303: 
> Invalid signal OSM_SIGNAL_CHANGE_DETECTED(2) in state 
> OSM_SM_STATE_SET_LINK_PORTS_WAIT
> 09:49:52 103626 [41001960] -> __osm_state_mgr_signal_error: ERR 3303: 
> Invalid signal OSM_SIGNAL_CHANGE_DETECTED(2) in state 
> OSM_SM_STATE_SET_LINK_PORTS_WAIT
> 09:49:52 103856 [41001960] -> __osm_state_mgr_signal_error: ERR 3303: 
> Invalid signal OSM_SIGNAL_CHANGE_DETECTED(2) in state 
> OSM_SM_STATE_SET_LINK_PORTS_WAIT
> 09:49:52 104077 [41802960] -> __osm_state_mgr_signal_error: ERR 3303: 
> Invalid signal OSM_SIGNAL_CHANGE_DETECTED(2) in state 
> OSM_SM_STATE_SET_LINK_PORTS_WAIT
> ...
> ...
> ...
> 
> 
> 1)  Why does the link down trap, start the long chain of 
> __osm_drop_mgr_remove_port?
> 
> 2) Which of the errors may have caused the the segfault?
> 
> 
> 
> Thanks,
> Todd
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general




More information about the general mailing list