[openib-general] Unreliable OpemSM failover

Hal Rosenstock halr at voltaire.com
Fri Dec 8 15:57:17 PST 2006


On Fri, 2006-12-08 at 18:44, Venkatesh Babu wrote:
>  I have 3 nodes and 2 IB switches. Port 1 of all nodes connected to 
> switch 1 and Port2 of all nodes connected to switch 2. So each switch 
> creates its own subnet and hence I have two instances of OpenSM for each 
> port. 

And the two switches are not connected to each other, right ?

> I have two OpenSMs running with priority 1 on node1 and two 
> OpenSM's running with priority 13 on node 2.

Do you set a different subnet prefix (other than the default on one) ?
Not sure if this matters yet in OpenIB but it might.

> Node 3 doesn't have any 
> OpenSM's but just a OFED kernel modules. I reboot the node 2 every 
> 10minutes. Since it has the highest priority, every time it boots up it 
> grabs the mastership from the node 1. It works most of the time, except 
> when this problem occurs.

Now I understand the scenario.

> When this problem occurs, node 3 shows the old/stale SMLID information. 
> But if you reload the ofed drivers or reboot the node to get the new LID 
> assignment it shows SMLID as 0.

That's consistent with the SM not really taking over. Just wanted to be
sure.

> Even though Node 1's SMLID and port LID 
> are same, it was not completely asserted the mastership.

OK.

> See the log messages below -
> 
> [root ~]# ibv_devinfo
> ...
>                 port:   1
>                         state:                  PORT_INIT (2)
>                         max_mtu:                2048 (4)
>                         active_mtu:             512 (2)
>                         sm_lid:                 0
>                         port_lid:               0
>                         port_lmc:               0x00
> 
>                 port:   2
>                         state:                  PORT_INIT (2)
>                         max_mtu:                2048 (4)
>                         active_mtu:             512 (2)
>                         sm_lid:                 0
>                         port_lid:               0
>                         port_lmc:               0x00
> 
> 
> 
>  The strace output is shown below -
> [root~]# strace -p 7518
> Process 7518 attached - interrupt to quit
> restart_syscall(0x7fbffff630, 0, 0, 0x7fbffff501, 0x335d) = 0
> nanosleep({10, 0}, NULL)                = 0
> nanosleep({10, 0}, NULL)                = 0
> nanosleep({10, 0}, NULL)                = 0
> nanosleep({10, 0}, NULL)                = 0
> nanosleep({10, 0}, NULL)                = 0
> nanosleep({10, 0}, NULL)                = 0
> nanosleep({10, 0}, NULL)                = 0
> nanosleep({10, 0}, NULL)                = 0
> nanosleep({10, 0}, NULL)                = 0
> 
>  The GDB output is shown below -
> [root ~]# gdb /usr/bin/opensm 7518
> GNU gdb Red Hat Linux (6.3.0.0-1.63rh)
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain 
> conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu"...
> (no debugging symbols found)
> Using host libthread_db library "/lib64/tls/libthread_db.so.1".
> 
> Attaching to program: /usr/bin/opensm, process 7518
> Reading symbols from /usr/lib/libibumad.so.1...done.
> Loaded symbols for /usr/lib/libibumad.so.1
> Reading symbols from /usr/lib/libopensm.so.1...done.
> Loaded symbols for /usr/lib/libopensm.so.1
> Reading symbols from /usr/lib/libosmcomp.so.1...done.
> Loaded symbols for /usr/lib/libosmcomp.so.1
> Reading symbols from /lib64/tls/libpthread.so.0...done.
> [Thread debugging using libthread_db enabled]
> [New Thread 182896213152 (LWP 7518)]
> [New Thread 1136679264 (LWP 7544)]
> [New Thread 1126189408 (LWP 7543)]
> [New Thread 1115699552 (LWP 7542)]
> [New Thread 1105209696 (LWP 7541)]
> [New Thread 1094719840 (LWP 7540)]
> [New Thread 1084229984 (LWP 7534)]
> Loaded symbols for /lib64/tls/libpthread.so.0
> Reading symbols from /usr/lib/libosmvendor.so.1...done.
> Loaded symbols for /usr/lib/libosmvendor.so.1
> Reading symbols from /usr/lib/libibcommon.so.1...done.
> Loaded symbols for /usr/lib/libibcommon.so.1
> Reading symbols from /lib64/tls/libc.so.6...done.
> Loaded symbols for /lib64/tls/libc.so.6
> Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
> Loaded symbols for /lib64/ld-linux-x86-64.so.2
> 0x000000316038ed65 in __nanosleep_nocancel () from /lib64/tls/libc.so.6
> (gdb) bt
> #0  0x000000316038ed65 in __nanosleep_nocancel () from /lib64/tls/libc.so.6
> #1  0x00000031603bf368 in usleep () from /lib64/tls/libc.so.6
> #2  0x000000316080df32 in cl_thread_suspend (pause_ms=10000) at 
> cl_thread.c:125
> #3  0x000000000040584e in main ()
> (gdb) print osm_hup_flag
> $1 = 0
> (gdb)

That's the main thread. It's in the following loop:

    while( !osm_exit_flag ) {
      if (opt.console)
        osm_console(&osm);
      else
        cl_thread_suspend( 10000 );

      if (osm_hup_flag) {
        osm_hup_flag = 0;
        /* a HUP signal should only start a new heavy sweep */
        osm.subn.force_immediate_heavy_sweep = TRUE;
        osm_opensm_sweep( &osm );
      }

What about the other threads ? What are they doing ?

> Following is the log output. It is entring to MASTER state. But it 
> doesn't show"SUBNET UP" event. It gets stuck.

I wouldn't expect that given the problem your hitting. The SUBNET UP
only occurs once the heavy sweep is completed. That's not happening.

-- Hal

> [root ~]#  tail /var/log/opensm1.log
> Dec 04 15:59:35 573040 [0000] -> OpenSM Rev:openib-1.2.1 OpenIB svn 3726M
> 
> Dec 04 15:59:35 783462 [9576BCA0] -> osm_report_notice: Reporting 
> Generic Notice type:3 num:66 from LID:0x0000 
> GID:0xfe80000000000000,0x0000000000000000
> Dec 04 15:59:35 783541 [9576BCA0] -> osm_report_notice: Reporting 
> Generic Notice type:3 num:66 from LID:0x0000 
> GID:0xfe80000000000000,0x0000000000000000
> Dec 04 15:59:35 783589 [9576BCA0] -> osm_vendor_bind: Binding to port 
> 0x5045014b1a0001
> Dec 04 15:59:35 787924 [9576BCA0] -> osm_vendor_bind: Binding to port 
> 0x5045014b1a0001
> Dec 04 15:59:35 800404 [0000] -> Entering STANDBY state
> 
> Dec 04 15:59:36 053784 [0000] -> Entering MASTER state
> 
> 
> 
> Hal Rosenstock wrote:
> 
> >On Fri, 2006-12-08 at 17:12, Venkatesh Babu wrote:
> >  
> >
> >>I have got the same problem with OFED 1.1 stack also, but the frequency 
> >>is less. I had to try 120 fail overs (by rebooting the highest priority 
> >>OpenSM server) before getting into this problem.
> >>    
> >>
> >
> >If I understand you correctly, you reboot the master SM and the standby
> >does not takeover (become master). Is that correct ?
> >
> >Is this with 2 SMs or more ?
> >
> >  
> >
> >>At this state OpenSM doesn't update anything to the log files; 
> >>doesn't assign the LIDs to the other nodes; doesn't respond 
> >>to the multi cast join operations. Even another OpenSM is 
> >>started on another node with higher priority it can 
> >>not become the master. The only way to recover from this is by killing 
> >>the stuck OpenSM.
> >>    
> >>
> >
> >What SMLID do the nodes in the subnet point to ?
> >
> >Can you determine where is it stuck ? Sounds like it could be in some
> >tight loop. Can you build with gdb and attach when this occurs to see ?
> >
> >-- Hal
> >
> >  
> >
> >> VBabu
> >>
> >>Hal Rosenstock wrote:
> >>
> >>    
> >>
> >>>I don't see any explicit changes to the SM state machine which would
> >>>affect this but as I have mentioned before there are many bug fixes in
> >>>OFED 1.1. I can't conclusively state whether this would fix the issue
> >>>you see but would be in a much better position to try to figure this
> >>>out.
> >>>
> >>>-- Hal
> >>>
> >>> 
> >>>
> >>>      
> >>>
> >>>>Hi
> >>>>
> >>>>  I have topology of two switches and a bunch of nodes, with each 
> >>>>node having 2port HCAs. Port1 of every node connects to switch1 and 
> >>>>Port2 of every node connects to switch2. So Port1 and Port2 are in 
> >>>>different subnets. So I am running one OpenSM (from OFED 1.0) for 
> >>>>each port on one node designated as a server. To guard against that 
> >>>>server going down I have another server node to run the OpenSM in 
> >>>>"standby" mode for each port. I will adjust the priorities such that 
> >>>>first server always has "master" OpenSM and second server has 
> >>>>"standby" OpenSM.
> >>>>
> >>>>   When the first server rebooted, "standby" OpenSM should takeover 
> >>>>the mastership. It usually works fine but sometimes it is failing to 
> >>>>takeover. In the following example OpenSM for Port1 failed to 
> >>>>takeover, but OpenSM for Port2 took over and became "master". The 
> >>>>OpenSM for Port1 seems be stuck in some weired state (strace shows 
> >>>>that it is sleeping). It is no longer assigning LIDs to the rest of 
> >>>>the nodes in the subnet and not responding to the broadcast joins. 
> >>>>The log file shows nothing from past 4 days. I have the complete log 
> >>>>files if needed.
> >>>>
> >>>>   Is this a known problem and fixed in OFED 1.1 ?
> >>>>
> >>>>[root at vortex3l-72 158]# ibv_devinfo
> >>>>hca_id: mthca0
> >>>>       fw_ver:                         5.1.400
> >>>>       node_guid:                      0050:4501:4b1a:0000
> >>>>       sys_image_guid:                 0050:4501:4b1a:0003
> >>>>       vendor_id:                      0x02c9
> >>>>       vendor_part_id:                 25218
> >>>>       hw_ver:                         0xA0
> >>>>       board_id:                       ARM0020000001
> >>>>       phys_port_cnt:                  2
> >>>>               port:   1
> >>>>                       state:                  PORT_ACTIVE (4)
> >>>>                       max_mtu:                2048 (4)
> >>>>                       active_mtu:             2048 (4)
> >>>>                       sm_lid:                 7
> >>>>                       port_lid:               1
> >>>>                       port_lmc:               0x00
> >>>>
> >>>>               port:   2
> >>>>                       state:                  PORT_ACTIVE (4)
> >>>>                       max_mtu:                2048 (4)
> >>>>                       active_mtu:             2048 (4)
> >>>>                       sm_lid:                 1
> >>>>                       port_lid:               1
> >>>>                       port_lmc:               0x00
> >>>>
> >>>>[root at vortex3l-72 158]# ps -aux | grep open
> >>>>Warning: bad syntax, perhaps a bogus '-'? See 
> >>>>/usr/share/doc/procps-3.2.3/FAQ
> >>>>root      7988  0.0  0.0 92784 1672 ?        Sl   Nov22   0:06 
> >>>>/usr/bin/opensm -g 0x005045014b1a0002 -p 13 -s 10 -u -f 
> >>>>/var/log/opensm2.log
> >>>>root      7975  0.0  0.0 92784 1572 ?        Sl   Nov22   0:06 
> >>>>/usr/bin/opensm -g 0x005045014b1a0001 -p 13 -s 10 -u -f 
> >>>>/var/log/opensm1.log
> >>>>root      7803  0.0  0.0 51096  668 pts/0    S+   12:11   0:00 grep open
> >>>>[root at vortex3l-72 158]# strace -p7975
> >>>>Process 7975 attached - interrupt to quit
> >>>>restart_syscall(0x7fbffff630, 0, 0, 0x7fbffff501, 0x130) = 0
> >>>>nanosleep({10, 0}, NULL)                = 0
> >>>>nanosleep({10, 0}, NULL)                = 0
> >>>>nanosleep({10, 0}, NULL)                = 0
> >>>>nanosleep({10, 0}, NULL)                = 0
> >>>>nanosleep({10, 0}, NULL)                = 0
> >>>>nanosleep({10, 0}, NULL)                = 0
> >>>>nanosleep({10, 0}, NULL)                = 0
> >>>>nanosleep({10, 0}, NULL)                = 0
> >>>>nanosleep({10, 0},  <unfinished ...>
> >>>>Process 7975 detached
> >>>>[root at vortex3l-72 158]# uptime
> >>>>12:13:02 up 4 days, 17:05,  5 users,  load average: 0.00, 0.00, 0.00
> >>>>[root at vortex3l-72 158]# date
> >>>>Mon Nov 27 12:13:05 PST 2006
> >>>>[root at vortex3l-72 158]#  tail /var/log/opensm1.log
> >>>>Nov 22 19:09:27 894295 [0000] -> OpenSM Rev:openib-1.2.1 OpenIB svn 
> >>>>3673M
> >>>>
> >>>>Nov 22 19:09:28 164482 [9576BCA0] -> osm_report_notice: Reporting 
> >>>>Generic Notice type:3 num:66 from LID:0x0000 
> >>>>GID:0xfe80000000000000,0x0000000000000000
> >>>>Nov 22 19:09:28 164560 [9576BCA0] -> osm_report_notice: Reporting 
> >>>>Generic Notice type:3 num:66 from LID:0x0000 
> >>>>GID:0xfe80000000000000,0x0000000000000000
> >>>>Nov 22 19:09:28 164608 [9576BCA0] -> osm_vendor_bind: Binding to port 
> >>>>0x5045014b1a0001
> >>>>Nov 22 19:09:28 167508 [9576BCA0] -> osm_vendor_bind: Binding to port 
> >>>>0x5045014b1a0001
> >>>>Nov 22 19:09:28 177285 [0000] -> Entering STANDBY state
> >>>>
> >>>>Nov 22 19:09:28 442435 [0000] -> Entering MASTER state
> >>>>
> >>>>[root at vortex3l-72 158]#  tail /var/log/opensm2.log
> >>>>                               00 00 00 00 00 00 00 00   00 00 00 00 
> >>>>00 00 00 00
> >>>>
> >>>>Nov 27 12:10:32 146325 [41401960] -> osm_report_notice: Reporting 
> >>>>Generic Notice type:3 num:65 from LID:0x0001 
> >>>>GID:0xfe80000000000000,0x005045014b1a0002
> >>>>Nov 27 12:10:32 146343 [41401960] -> __match_notice_to_inf_rec: 
> >>>>Cannot find destination port with LID:0x0002
> >>>>Nov 27 12:10:32 146358 [41401960] -> __match_notice_to_inf_rec: 
> >>>>Cannot find destination port with LID:0x0003
> >>>>Nov 27 12:10:32 146373 [41401960] -> __match_notice_to_inf_rec: 
> >>>>Cannot find destination port with LID:0x0004
> >>>>Nov 27 12:10:32 146382 [41401960] -> Removed port with 
> >>>>GUID:0x0002c9020020f5ae LID range [0x6,0x6] of node:sqaathlon03 HCA-1
> >>>>Nov 27 12:10:32 146400 [41401960] -> osm_drop_mgr_process: ERR 0108: 
> >>>>Unknown remote side for node 0x0002c9010d26bae0 port 11. Adding to 
> >>>>light sweep sampling list
> >>>>Nov 27 12:10:32 146420 [41401960] -> Directed Path Dump of 1 hop path:
> >>>>                               Path = [0][2]
> >>>>
> >>>>        
> >>>>
> >
> >  
> >





More information about the general mailing list