[ofa-general] OpenSM --run-once question

Yevgeny Kliteynik kliteyn at dev.mellanox.co.il
Tue Apr 10 06:50:34 PDT 2007


Hi Hal.

I have a question regarding the --run-once OpenSM option.

I have two HCAs connected through a single InfiniScale III switch.
I restart the driver on an HCA, which causes port to go down and
up, which in turn causes the switch to start training sequence to
decide whether it should work in SDR or DDR. This training sequence
takes about 10-15 seconds.

Now, if I run OpenSM during this period, it finishes initialization
with errors (printing the "Errors during initialization" error message),
and immediately starts new sweep, doing it again and again, until switch
training sequence is over and SM manages to bring subnet up.
  
Now, when I run OpenSM with --run-once, OpenSM finishes the first
sweep with these "errors during initialization" and exits with status=0.

Is this behavior intentional?
Should OSM loop until the subnet will be really up?
Or perhaps exit with some status other than 0?

Here's the relevant code snip from osm_state_mgr.c:

            /* If there were errors - then the subnet is not really up */
            if( p_mgr->p_subn->subnet_initialization_error == TRUE )
            {
               __osm_state_mgr_init_errors_msg( p_mgr );
            }
            else
            {
               /* The subnet is up correctly - set the first_time_master_sweep flag 
                * (if it is on) to FALSE. */
               ..... bla bla
            }
            p_mgr->state = OSM_SM_STATE_PROCESS_REQUEST;
            signal = OSM_SIGNAL_IDLE_TIME_PROCESS;

            /*
             * Finally signal the subnet up event
             */
            status = cl_event_signal( p_mgr->p_subnet_up_event );

-- Yevgeny



More information about the general mailing list