[openib-general] Unreliable OpemSM failover

Venkatesh Babu venkatesh.babu at 3leafnetworks.com
Fri Dec 8 16:30:01 PST 2006


Hal Rosenstock wrote:

>And the two switches are not connected to each other, right ?
>  
>
  Yes, the switches are not connected.

>Do you set a different subnet prefix (other than the default on one) ?
>Not sure if this matters yet in OpenIB but it might.
>  
>
 I don't know how to set subnet prefix. So it may be default one.

>That's the main thread. It's in the following loop:
>
>    while( !osm_exit_flag ) {
>      if (opt.console)
>        osm_console(&osm);
>      else
>        cl_thread_suspend( 10000 );
>
>      if (osm_hup_flag) {
>        osm_hup_flag = 0;
>        /* a HUP signal should only start a new heavy sweep */
>        osm.subn.force_immediate_heavy_sweep = TRUE;
>        osm_opensm_sweep( &osm );
>      }
>
>What about the other threads ? What are they doing ?
>  
>
  Yes. I got this. It was in this loop. I didn't realized there are 
other OpenSM threads running. I need to find that out.

>I wouldn't expect that given the problem your hitting. The SUBNET UP
>only occurs once the heavy sweep is completed. That's not happening.
>
>-- Hal
>  
>
   Is the heavy sweep supposed to happen after the failover ?

   Is there any documentaion on the OpenSM architecture and design ?

 VBabu




More information about the general mailing list