[openib-general] osm_state_mgr_process and osm_sm_state_mgr_process

Tue Nov 7 08:34:47 PST 2006

On 09:02 Tue 07 Nov     , Hal Rosenstock wrote:
> On Tue, 2006-11-07 at 07:36, Eitan Zahavi wrote:
> > Michael Arndt wrote:
> > > Hi,
> > >
> > > is there an idea or concept why the state managment is seperate in these two 
> > > functions.
> > > I would really like if the answer isn't just yes or no.:)
> > >   
> 
> This predates my involvement with OpenSM but I have the following
> comments:
> 
> > The osm_state_mgr_process file is describing the OpenSM state machine.
> 
> This state machine handles the lower level aspects of the SM state
> machine as detailed in IBA 1.2 vol 1 section 14.4. This contains
> behavior "beyond the spec" in that much of SM behavior is an exercise
> left to the reader.

I think we also would call this the "sweep" state machine because it
follows/tracks OpenSM sweep (fabric discovery, configuration, setups)
stages.

> > The osm_sm_state_mgr_process latter defines the SM state machine 
> > following the IBTA SM states and their transitions.
> 
> This machine handles the high level aspects of the SM state machine in
> 14.4.1 and can be driven by the SMInfo attribute.
> 
> > The first one is far more complex and defines the stages required to 
> > discover and configure the fabric.
> > The second one is used to track the SM state in terms that are IBTA 
> > compliant and standard. They are also map directly to SM states 
> > available in the SMInfo attribute.
> 
> As to why they are separate state machines (and the implied question of
> whether they could/should be combined), I'm not sure.

IMO they should be combined, or more correctly we don't need the "sweep"
state processor there (known as complex and known as buggy - in the log
we still see the messages like "invalid signal X in state Y").

The subnet sweep process has pretty linear logic (just look at repeated
sections of states like OSM_STATE_PHASEX, OSM_STATE_PHASEX_WAIT,
OSM_STATE_PHASEX_DONE even in existing code), so I think the
implementation should follow this logic. At high level this could be
something like "main loop" flow:

	while(1) {
		do_idle_works(timeout); /* breaks upon events or
					   due to timeout */
		...
		do_sweep();
		...
	}

This should be much simpler and more robust than existing implementation.
And as yet another positive "side effect" this helps to remove the "big
state lock" (which can block all dispatchers) and to improve
scalability.

Sasha