[openib-general] OpenSM (again)

Hal Rosenstock halr at voltaire.com
Tue Apr 12 02:56:29 PDT 2005


On Mon, 2005-04-11 at 12:28, Roland Fehrenbacher wrote: 
> Problem: When I reboot all the 40 nodes (apart from the one the opensm
> is running), the network is non-functional (no pings go through, even
> though ports show status "Active") for quite a while (more than 10
> minutes) after all the nodes have come up. It then recovers without
> intervention. Is this normal? Single node reboots don't affect the
> network operation. osm Log file is appended.
> 
> ______________________________________________________________________
> Apr 10 15:05:55 [4000] -> OpenSM Rev:openib-1.0.0
> Apr 10 15:05:55 [4000] -> osm_opensm_init: Forcing single threaded dispatcher.
> Apr 10 15:05:55 [4000] -> osm_report_notice: Reporting Generic Notice type:3 num:66 from LID:0x0000 GID:0xfe80000000000000,0x0000000000000000
> Apr 10 15:05:55 [4000] -> osm_report_notice: Reporting Generic Notice type:3 num:66 from LID:0x0000 GID:0xfe80000000000000,0x0000000000000000
> Apr 10 15:05:55 [4000] -> osm_vendor_get_all_port_attr: assign CA mthca0 port 1 guid (0x2c902004013c1) as the default port.
> Apr 10 15:05:55 [4000] -> osm_vendor_bind: Binding to port 0x2c902004013c1.
> Apr 10 15:05:55 [4000] -> osm_vendor_bind: Unable to register class 129 version 1.
> Apr 10 15:05:55 [4000] -> osm_sm_mad_ctrl_bind: ERR 3118: Vendor specific bind() failed.
> Apr 10 15:05:55 [4000] -> osm_sm_bind: ERR 2E10: SM MAD Controller bind() failed (IB_ERROR).
> Apr 10 15:06:58 [4000] -> OpenSM Rev:openib-1.0.0
> Apr 10 15:06:58 [4000] -> osm_opensm_init: Forcing single threaded dispatcher.
> Apr 10 15:06:58 [4000] -> osm_report_notice: Reporting Generic Notice type:3 num:66 from LID:0x0000 GID:0xfe80000000000000,0x0000000000000000
> Apr 10 15:06:58 [4000] -> osm_report_notice: Reporting Generic Notice type:3 num:66 from LID:0x0000 GID:0xfe80000000000000,0x0000000000000000
> Apr 10 15:06:58 [4000] -> osm_vendor_get_all_port_attr: assign CA mthca0 port 1 guid (0x2c902004013c1) as the default port.
> Apr 10 15:06:58 [4000] -> osm_vendor_bind: Binding to port 0x2c902004013c1.
> Apr 10 15:06:58 [4000] -> osm_vendor_bind: Unable to register class 129 version 1.
> Apr 10 15:06:58 [4000] -> osm_sm_mad_ctrl_bind: ERR 3118: Vendor specific bind() failed.
> Apr 10 15:06:58 [4000] -> osm_sm_bind: ERR 2E10: SM MAD Controller bind() failed (IB_ERROR).
> Apr 10 15:07:44 [4000] -> OpenSM Rev:openib-1.0.0
> Apr 10 15:07:44 [4000] -> osm_opensm_init: Forcing single threaded dispatcher.
> Apr 10 15:07:44 [4000] -> osm_report_notice: Reporting Generic Notice type:3 num:66 from LID:0x0000 GID:0xfe80000000000000,0x0000000000000000
> Apr 10 15:07:44 [4000] -> osm_report_notice: Reporting Generic Notice type:3 num:66 from LID:0x0000 GID:0xfe80000000000000,0x0000000000000000
> Apr 10 15:07:44 [4000] -> osm_vendor_get_all_port_attr: assign CA mthca0 port 1 guid (0x2c902004013c1) as the default port.
> Apr 10 15:07:44 [4000] -> osm_vendor_bind: Binding to port 0x2c902004013c1.
> Apr 10 15:07:44 [4000] -> osm_vendor_bind: Binding to port 0x2c902004013c1.
> Apr 10 15:07:44 [18007] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x04 num:144 Producer:1 from LID:0x0011 TID:0x000000000000000a
> Apr 10 15:07:44 [2400A] -> umad_receiver: send completed with error(method=1 attr=11) -- dropping.

This is a SubnGet of NodeInfo which is timing out.

> Apr 10 15:07:45 [2400A] -> umad_receiver: send completed with error(method=1 attr=11) -- dropping.
> Apr 10 15:07:45 [2400A] -> umad_receiver: send completed with error(method=1 attr=11) -- dropping.
> Apr 10 15:07:45 [2400A] -> umad_receiver: send completed with error(method=1 attr=11) -- dropping.
> Apr 10 15:07:45 [2400A] -> umad_receiver: send completed with error(method=1 attr=11) -- dropping.
> Apr 10 15:07:45 [2400A] -> umad_receiver: send completed with error(method=1 attr=16) -- dropping.
> 
This is a SubnGet of  PkeyTable which is timing out.

> Apr 10 15:07:45 [2400A] -> umad_receiver: send completed with error(method=1 attr=16) -- dropping.

> Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
> Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
> Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
> Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
> Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
> Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
> Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
> Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
> Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
> Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
> Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
> Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
> Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
> Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
> Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
> Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
> Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
> Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
> Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
> Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
> Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
> Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
> Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
> Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
> Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
> Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
> Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
> Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
> Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
> Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
> Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
> Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
> Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
> Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
> Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
> Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
> Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
> Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
> Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
> Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
> Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
> Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.

These are SA MADs being received when SM is not yet ready to handle
them. They could be SA sets of MCMemberRecord (from IPoIB). SA clients
in end nodes should retry them (assuming not exhaust their timeout/retry
strategy).

For debug purposes, it might be nice to display the method and attribute
of the SA MAD.

> Apr 10 15:07:46 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_SWEEP_HEAVY_SELF.
> Apr 10 15:07:46 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_SWEEP_HEAVY_SUBNET.
> Apr 10 15:07:46 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_SWEEP_HEAVY_SUBNET.
> Apr 10 15:07:46 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_SWEEP_HEAVY_SUBNET.
> Apr 10 15:07:46 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_SWEEP_HEAVY_SUBNET.
> Apr 10 15:07:46 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_SWEEP_HEAVY_SUBNET.
> Apr 10 15:07:46 [2400A] -> umad_receiver: send completed with error(method=1 attr=11) -- dropping.
> Apr 10 15:07:46 [2400A] -> umad_receiver: send completed with error(method=1 attr=11) -- dropping.
> Apr 10 15:07:46 [2400A] -> umad_receiver: send completed with error(method=1 attr=11) -- dropping.
> Apr 10 15:07:46 [2400A] -> umad_receiver: send completed with error(method=1 attr=16) -- dropping.
> Apr 10 15:07:46 [2400A] -> umad_receiver: send completed with error(method=1 attr=16) -- dropping.
> Apr 10 15:07:47 [2400A] -> umad_receiver: send completed with error(method=1 attr=16) -- dropping.
> Apr 10 15:07:47 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_SWEEP_HEAVY_SUBNET.
> Apr 10 15:07:47 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_SWEEP_HEAVY_SUBNET.
> Apr 10 15:08:16 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:08:16 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:08:16 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.

In the most recent OpenSM (gen1), this has been changed from error to warning. (That doesn't explain the delay in connectivity).

> Apr 11 08:32:17 [18007] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0028 TID:0x000000000000004c
> Apr 11 08:32:17 [18007] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0028 GID:0xfe80000000000000,0x0002c9010befe900
> Apr 11 08:32:17 [18007] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0011 GID:0xfe80000000000000,0x0002c902004013c1
> Apr 11 08:32:17 [18007] -> Discovered new port with GUID:0x0002c902004012e9 LID range [0x3D,0x3D] of node:MT23108 InfiniHost Mellanox Technologies 
> Apr 11 08:32:17 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
> Apr 11 08:35:27 [18007] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0028 TID:0x000000000000004d
> Apr 11 08:35:27 [18007] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0028 GID:0xfe80000000000000,0x0002c9010befe900
> Apr 11 08:35:27 [18007] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0011 GID:0xfe80000000000000,0x0002c902004013c1
> Apr 11 08:35:27 [18007] -> Removed port with GUID:0x0002c902004012e9 LID range [0x3D,0x3D] of node:MT23108 InfiniHost Mellanox Technologies 

At what point, did it start working again ? Was it at 15:24 ? (That
appears to be a 16-17 minute delay in connectivity).

-- Hal




More information about the general mailing list