[openib-general] Unreliable OpemSM failover

Hal Rosenstock halr at voltaire.com
Fri Dec 8 16:48:13 PST 2006


On Fri, 2006-12-08 at 19:30, Venkatesh Babu wrote:
> Hal Rosenstock wrote:
> 
> >And the two switches are not connected to each other, right ?
> >  
> >
>   Yes, the switches are not connected.
> 
> >Do you set a different subnet prefix (other than the default on one) ?
> >Not sure if this matters yet in OpenIB but it might.
> >  
> >
>  I don't know how to set subnet prefix.

In opensm.opts file:

# Subnet prefix used on this subnet
subnet_prefix 0xfe80000000000000

(that's the default one)

>  So it may be default one.
> 
> >That's the main thread. It's in the following loop:
> >
> >    while( !osm_exit_flag ) {
> >      if (opt.console)
> >        osm_console(&osm);
> >      else
> >        cl_thread_suspend( 10000 );
> >
> >      if (osm_hup_flag) {
> >        osm_hup_flag = 0;
> >        /* a HUP signal should only start a new heavy sweep */
> >        osm.subn.force_immediate_heavy_sweep = TRUE;
> >        osm_opensm_sweep( &osm );
> >      }
> >
> >What about the other threads ? What are they doing ?
> >  
> >
>   Yes. I got this. It was in this loop. I didn't realized there are 
> other OpenSM threads running. I need to find that out.

OK.

> >I wouldn't expect that given the problem your hitting. The SUBNET UP
> >only occurs once the heavy sweep is completed. That's not happening.
> >
> >-- Hal
> >  
> >
>    Is the heavy sweep supposed to happen after the failover ?

The standby after determining that the master is non responsive will go
back to discovering but in your configuration will find no other SM and
will go to master. I think it got that far.

Once it transitions to master, it does a heavy sweep to configure the
subnet. Something is stopping that from completing. I'm not sure what is
going wrong.

>    Is there any documentaion on the OpenSM architecture and design ?

Just the code AFAIK. You can read the SM and SA sections of IBA volume 1
for what an SM is supposed to do.

-- Hal

>  VBabu





More information about the general mailing list