[ofa-general] Re: OpenSM --run-once question

Hal Rosenstock halr at voltaire.com
Tue Apr 10 06:50:32 PDT 2007


Hi Yevgeny,

On Tue, 2007-04-10 at 09:50, Yevgeny Kliteynik wrote:
> Hi Hal.
> 
> I have a question regarding the --run-once OpenSM option.
> 
> I have two HCAs connected through a single InfiniScale III switch.
> I restart the driver on an HCA, which causes port to go down and
> up, which in turn causes the switch to start training sequence to
> decide whether it should work in SDR or DDR. This training sequence
> takes about 10-15 seconds.
> 
> Now, if I run OpenSM during this period, it finishes initialization
> with errors (printing the "Errors during initialization" error message),
> and immediately starts new sweep, doing it again and again, until switch
> training sequence is over and SM manages to bring subnet up.
>   
> Now, when I run OpenSM with --run-once, OpenSM finishes the first
> sweep with these "errors during initialization" and exits with status=0.
> 
> Is this behavior intentional?

Don't know for sure. --run-once predates my involvement and I don't
generally use it although I do know about some use cases for it.

> Should OSM loop until the subnet will be really up?

I think one could argue this one way or the other. As the subnet may not
come up, not sure it should loop.

> Or perhaps exit with some status other than 0?

That seems reasonable and the minimum that should be done so there is
some warning that the subnet may not be initialized properly.

-- Hal

> Here's the relevant code snip from osm_state_mgr.c:
> 
>             /* If there were errors - then the subnet is not really up */
>             if( p_mgr->p_subn->subnet_initialization_error == TRUE )
>             {
>                __osm_state_mgr_init_errors_msg( p_mgr );
>             }
>             else
>             {
>                /* The subnet is up correctly - set the first_time_master_sweep flag 
>                 * (if it is on) to FALSE. */
>                ..... bla bla
>             }
>             p_mgr->state = OSM_SM_STATE_PROCESS_REQUEST;
>             signal = OSM_SIGNAL_IDLE_TIME_PROCESS;
> 
>             /*
>              * Finally signal the subnet up event
>              */
>             status = cl_event_signal( p_mgr->p_subnet_up_event );
> 
> -- Yevgeny




More information about the general mailing list