[ofa-general] ***SPAM*** opensm failure

Todd Bowman twbowman at gmail.com
Tue Sep 9 14:53:56 PDT 2008


OpenSM Rev:openib-3.0.13


The opensm segfaulted during an initialization that seems to have been the
result of a link state trap (type 1 num12)


09:49:51 914967 [41001960] -> __osm_trap_rcv_process_request: Received
Generic Notice type:0x01 num:128 Producer:2 from LID:0x011A
TID:0x00000000000016cc
09:49:51 948014 [41001960] -> osm_report_notice: Reporting Generic Notice
type:1 num:128 from LID:0x011A GID:0xfe80000000000000,0x0008f104003f0ab5
09:49:51 948477 [41802960] -> osm_report_notice: Reporting Generic Notice
type:3 num:67 from LID:0x00FD GID:0xfe80000000000000,0x0002c902002064ad
09:49:51 948497 [41802960] -> osm_report_notice: Reporting Generic Notice
type:3 num:65 from LID:0x00FD GID:0xfe80000000000000,0x0002c902002064ad
09:49:51 948502 [41802960] -> __osm_drop_mgr_remove_port: Removed port with
GUID:0x0002c90200207801 LID range [0x89,0x89] of node:n1008
09:49:51 948519 [41802960] -> osm_report_notice: Reporting Generic Notice
type:3 num:67 from LID:0x00FD GID:0xfe80000000000000,0x0002c902002064ad
09:49:51 948529 [41802960] -> osm_report_notice: Reporting Generic Notice
type:3 num:65 from LID:0x00FD GID:0xfe80000000000000,0x0002c902002064ad
...
...
...

09:49:51 962126 [41802960] -> __osm_drop_mgr_remove_port: Removed port with
GUID:0x0002c902002064ad LID range [0xFD,0xFD] of node:hn HCA-1
09:49:52 044097 [41802960] -> __osm_lid_mgr_process_our_sm_node: ERR 0308:
Can't acquire SM's port object, GUID 0x0002c902002064ad
09:49:52 098558 [41001960] -> __osm_state_mgr_signal_error: ERR 3303:
Invalid signal OSM_SIGNAL_CHANGE_DETECTED(2) in state
OSM_SM_STATE_SET_SUBNET_UCAST_LIDS_WAIT
09:49:52 098917 [41001960] -> __osm_state_mgr_check_tbl_consistency: ERR
3322: lid 0x6E is wrongly assigned to port 0x0008f104003f2cdb in
port_lid_tbl
09:49:52 098936 [41001960] -> osm_report_notice: Reporting Generic Notice
type:3 num:64 from LID:0x00FD GID:0xfe80000000000000,0x0002c902002064ad
09:49:52 098944 [41001960] -> __osm_state_mgr_report_new_ports: Discovered
new port with GUID:0x0008f104003f2cdb LID range [0x0,0x0] of
node:ISR9288/ISR9096 Voltaire sLB-24
09:49:52 098957 [41001960] -> osm_ucast_mgr_process: null (min-hop) tables
configured on all switches
09:49:52 098992 [41001960] -> __osm_ucast_mgr_process_port: ERR 3A04: Port
0x8f104003f2cdb has LID 0. An initialization error occurred. Ignoring port
09:49:52 103405 [41802960] -> __osm_state_mgr_signal_error: ERR 3303:
Invalid signal OSM_SIGNAL_CHANGE_DETECTED(2) in state
OSM_SM_STATE_SET_LINK_PORTS_WAIT
09:49:52 103626 [41001960] -> __osm_state_mgr_signal_error: ERR 3303:
Invalid signal OSM_SIGNAL_CHANGE_DETECTED(2) in state
OSM_SM_STATE_SET_LINK_PORTS_WAIT
09:49:52 103856 [41001960] -> __osm_state_mgr_signal_error: ERR 3303:
Invalid signal OSM_SIGNAL_CHANGE_DETECTED(2) in state
OSM_SM_STATE_SET_LINK_PORTS_WAIT
09:49:52 104077 [41802960] -> __osm_state_mgr_signal_error: ERR 3303:
Invalid signal OSM_SIGNAL_CHANGE_DETECTED(2) in state
OSM_SM_STATE_SET_LINK_PORTS_WAIT
...
...
...


1)  Why does the link down trap, start the long chain of
__osm_drop_mgr_remove_port?

2) Which of the errors may have caused the the segfault?



Thanks,
Todd
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080909/4c1dc7f4/attachment.html>


More information about the general mailing list