[openib-general] Unreliable OpemSM failover
Venkatesh Babu
venkatesh.babu at 3leafnetworks.com
Fri Dec 8 16:30:01 PST 2006
Hal Rosenstock wrote:
>And the two switches are not connected to each other, right ?
>
>
Yes, the switches are not connected.
>Do you set a different subnet prefix (other than the default on one) ?
>Not sure if this matters yet in OpenIB but it might.
>
>
I don't know how to set subnet prefix. So it may be default one.
>That's the main thread. It's in the following loop:
>
> while( !osm_exit_flag ) {
> if (opt.console)
> osm_console(&osm);
> else
> cl_thread_suspend( 10000 );
>
> if (osm_hup_flag) {
> osm_hup_flag = 0;
> /* a HUP signal should only start a new heavy sweep */
> osm.subn.force_immediate_heavy_sweep = TRUE;
> osm_opensm_sweep( &osm );
> }
>
>What about the other threads ? What are they doing ?
>
>
Yes. I got this. It was in this loop. I didn't realized there are
other OpenSM threads running. I need to find that out.
>I wouldn't expect that given the problem your hitting. The SUBNET UP
>only occurs once the heavy sweep is completed. That's not happening.
>
>-- Hal
>
>
Is the heavy sweep supposed to happen after the failover ?
Is there any documentaion on the OpenSM architecture and design ?
VBabu
More information about the general
mailing list