[openib-general] Unreliable OpemSM failover
Hal Rosenstock
halr at voltaire.com
Fri Dec 8 15:57:17 PST 2006
On Fri, 2006-12-08 at 18:44, Venkatesh Babu wrote:
> I have 3 nodes and 2 IB switches. Port 1 of all nodes connected to
> switch 1 and Port2 of all nodes connected to switch 2. So each switch
> creates its own subnet and hence I have two instances of OpenSM for each
> port.
And the two switches are not connected to each other, right ?
> I have two OpenSMs running with priority 1 on node1 and two
> OpenSM's running with priority 13 on node 2.
Do you set a different subnet prefix (other than the default on one) ?
Not sure if this matters yet in OpenIB but it might.
> Node 3 doesn't have any
> OpenSM's but just a OFED kernel modules. I reboot the node 2 every
> 10minutes. Since it has the highest priority, every time it boots up it
> grabs the mastership from the node 1. It works most of the time, except
> when this problem occurs.
Now I understand the scenario.
> When this problem occurs, node 3 shows the old/stale SMLID information.
> But if you reload the ofed drivers or reboot the node to get the new LID
> assignment it shows SMLID as 0.
That's consistent with the SM not really taking over. Just wanted to be
sure.
> Even though Node 1's SMLID and port LID
> are same, it was not completely asserted the mastership.
OK.
> See the log messages below -
>
> [root ~]# ibv_devinfo
> ...
> port: 1
> state: PORT_INIT (2)
> max_mtu: 2048 (4)
> active_mtu: 512 (2)
> sm_lid: 0
> port_lid: 0
> port_lmc: 0x00
>
> port: 2
> state: PORT_INIT (2)
> max_mtu: 2048 (4)
> active_mtu: 512 (2)
> sm_lid: 0
> port_lid: 0
> port_lmc: 0x00
>
>
>
> The strace output is shown below -
> [root~]# strace -p 7518
> Process 7518 attached - interrupt to quit
> restart_syscall(0x7fbffff630, 0, 0, 0x7fbffff501, 0x335d) = 0
> nanosleep({10, 0}, NULL) = 0
> nanosleep({10, 0}, NULL) = 0
> nanosleep({10, 0}, NULL) = 0
> nanosleep({10, 0}, NULL) = 0
> nanosleep({10, 0}, NULL) = 0
> nanosleep({10, 0}, NULL) = 0
> nanosleep({10, 0}, NULL) = 0
> nanosleep({10, 0}, NULL) = 0
> nanosleep({10, 0}, NULL) = 0
>
> The GDB output is shown below -
> [root ~]# gdb /usr/bin/opensm 7518
> GNU gdb Red Hat Linux (6.3.0.0-1.63rh)
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain
> conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB. Type "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu"...
> (no debugging symbols found)
> Using host libthread_db library "/lib64/tls/libthread_db.so.1".
>
> Attaching to program: /usr/bin/opensm, process 7518
> Reading symbols from /usr/lib/libibumad.so.1...done.
> Loaded symbols for /usr/lib/libibumad.so.1
> Reading symbols from /usr/lib/libopensm.so.1...done.
> Loaded symbols for /usr/lib/libopensm.so.1
> Reading symbols from /usr/lib/libosmcomp.so.1...done.
> Loaded symbols for /usr/lib/libosmcomp.so.1
> Reading symbols from /lib64/tls/libpthread.so.0...done.
> [Thread debugging using libthread_db enabled]
> [New Thread 182896213152 (LWP 7518)]
> [New Thread 1136679264 (LWP 7544)]
> [New Thread 1126189408 (LWP 7543)]
> [New Thread 1115699552 (LWP 7542)]
> [New Thread 1105209696 (LWP 7541)]
> [New Thread 1094719840 (LWP 7540)]
> [New Thread 1084229984 (LWP 7534)]
> Loaded symbols for /lib64/tls/libpthread.so.0
> Reading symbols from /usr/lib/libosmvendor.so.1...done.
> Loaded symbols for /usr/lib/libosmvendor.so.1
> Reading symbols from /usr/lib/libibcommon.so.1...done.
> Loaded symbols for /usr/lib/libibcommon.so.1
> Reading symbols from /lib64/tls/libc.so.6...done.
> Loaded symbols for /lib64/tls/libc.so.6
> Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
> Loaded symbols for /lib64/ld-linux-x86-64.so.2
> 0x000000316038ed65 in __nanosleep_nocancel () from /lib64/tls/libc.so.6
> (gdb) bt
> #0 0x000000316038ed65 in __nanosleep_nocancel () from /lib64/tls/libc.so.6
> #1 0x00000031603bf368 in usleep () from /lib64/tls/libc.so.6
> #2 0x000000316080df32 in cl_thread_suspend (pause_ms=10000) at
> cl_thread.c:125
> #3 0x000000000040584e in main ()
> (gdb) print osm_hup_flag
> $1 = 0
> (gdb)
That's the main thread. It's in the following loop:
while( !osm_exit_flag ) {
if (opt.console)
osm_console(&osm);
else
cl_thread_suspend( 10000 );
if (osm_hup_flag) {
osm_hup_flag = 0;
/* a HUP signal should only start a new heavy sweep */
osm.subn.force_immediate_heavy_sweep = TRUE;
osm_opensm_sweep( &osm );
}
What about the other threads ? What are they doing ?
> Following is the log output. It is entring to MASTER state. But it
> doesn't show"SUBNET UP" event. It gets stuck.
I wouldn't expect that given the problem your hitting. The SUBNET UP
only occurs once the heavy sweep is completed. That's not happening.
-- Hal
> [root ~]# tail /var/log/opensm1.log
> Dec 04 15:59:35 573040 [0000] -> OpenSM Rev:openib-1.2.1 OpenIB svn 3726M
>
> Dec 04 15:59:35 783462 [9576BCA0] -> osm_report_notice: Reporting
> Generic Notice type:3 num:66 from LID:0x0000
> GID:0xfe80000000000000,0x0000000000000000
> Dec 04 15:59:35 783541 [9576BCA0] -> osm_report_notice: Reporting
> Generic Notice type:3 num:66 from LID:0x0000
> GID:0xfe80000000000000,0x0000000000000000
> Dec 04 15:59:35 783589 [9576BCA0] -> osm_vendor_bind: Binding to port
> 0x5045014b1a0001
> Dec 04 15:59:35 787924 [9576BCA0] -> osm_vendor_bind: Binding to port
> 0x5045014b1a0001
> Dec 04 15:59:35 800404 [0000] -> Entering STANDBY state
>
> Dec 04 15:59:36 053784 [0000] -> Entering MASTER state
>
>
>
> Hal Rosenstock wrote:
>
> >On Fri, 2006-12-08 at 17:12, Venkatesh Babu wrote:
> >
> >
> >>I have got the same problem with OFED 1.1 stack also, but the frequency
> >>is less. I had to try 120 fail overs (by rebooting the highest priority
> >>OpenSM server) before getting into this problem.
> >>
> >>
> >
> >If I understand you correctly, you reboot the master SM and the standby
> >does not takeover (become master). Is that correct ?
> >
> >Is this with 2 SMs or more ?
> >
> >
> >
> >>At this state OpenSM doesn't update anything to the log files;
> >>doesn't assign the LIDs to the other nodes; doesn't respond
> >>to the multi cast join operations. Even another OpenSM is
> >>started on another node with higher priority it can
> >>not become the master. The only way to recover from this is by killing
> >>the stuck OpenSM.
> >>
> >>
> >
> >What SMLID do the nodes in the subnet point to ?
> >
> >Can you determine where is it stuck ? Sounds like it could be in some
> >tight loop. Can you build with gdb and attach when this occurs to see ?
> >
> >-- Hal
> >
> >
> >
> >> VBabu
> >>
> >>Hal Rosenstock wrote:
> >>
> >>
> >>
> >>>I don't see any explicit changes to the SM state machine which would
> >>>affect this but as I have mentioned before there are many bug fixes in
> >>>OFED 1.1. I can't conclusively state whether this would fix the issue
> >>>you see but would be in a much better position to try to figure this
> >>>out.
> >>>
> >>>-- Hal
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>>Hi
> >>>>
> >>>> I have topology of two switches and a bunch of nodes, with each
> >>>>node having 2port HCAs. Port1 of every node connects to switch1 and
> >>>>Port2 of every node connects to switch2. So Port1 and Port2 are in
> >>>>different subnets. So I am running one OpenSM (from OFED 1.0) for
> >>>>each port on one node designated as a server. To guard against that
> >>>>server going down I have another server node to run the OpenSM in
> >>>>"standby" mode for each port. I will adjust the priorities such that
> >>>>first server always has "master" OpenSM and second server has
> >>>>"standby" OpenSM.
> >>>>
> >>>> When the first server rebooted, "standby" OpenSM should takeover
> >>>>the mastership. It usually works fine but sometimes it is failing to
> >>>>takeover. In the following example OpenSM for Port1 failed to
> >>>>takeover, but OpenSM for Port2 took over and became "master". The
> >>>>OpenSM for Port1 seems be stuck in some weired state (strace shows
> >>>>that it is sleeping). It is no longer assigning LIDs to the rest of
> >>>>the nodes in the subnet and not responding to the broadcast joins.
> >>>>The log file shows nothing from past 4 days. I have the complete log
> >>>>files if needed.
> >>>>
> >>>> Is this a known problem and fixed in OFED 1.1 ?
> >>>>
> >>>>[root at vortex3l-72 158]# ibv_devinfo
> >>>>hca_id: mthca0
> >>>> fw_ver: 5.1.400
> >>>> node_guid: 0050:4501:4b1a:0000
> >>>> sys_image_guid: 0050:4501:4b1a:0003
> >>>> vendor_id: 0x02c9
> >>>> vendor_part_id: 25218
> >>>> hw_ver: 0xA0
> >>>> board_id: ARM0020000001
> >>>> phys_port_cnt: 2
> >>>> port: 1
> >>>> state: PORT_ACTIVE (4)
> >>>> max_mtu: 2048 (4)
> >>>> active_mtu: 2048 (4)
> >>>> sm_lid: 7
> >>>> port_lid: 1
> >>>> port_lmc: 0x00
> >>>>
> >>>> port: 2
> >>>> state: PORT_ACTIVE (4)
> >>>> max_mtu: 2048 (4)
> >>>> active_mtu: 2048 (4)
> >>>> sm_lid: 1
> >>>> port_lid: 1
> >>>> port_lmc: 0x00
> >>>>
> >>>>[root at vortex3l-72 158]# ps -aux | grep open
> >>>>Warning: bad syntax, perhaps a bogus '-'? See
> >>>>/usr/share/doc/procps-3.2.3/FAQ
> >>>>root 7988 0.0 0.0 92784 1672 ? Sl Nov22 0:06
> >>>>/usr/bin/opensm -g 0x005045014b1a0002 -p 13 -s 10 -u -f
> >>>>/var/log/opensm2.log
> >>>>root 7975 0.0 0.0 92784 1572 ? Sl Nov22 0:06
> >>>>/usr/bin/opensm -g 0x005045014b1a0001 -p 13 -s 10 -u -f
> >>>>/var/log/opensm1.log
> >>>>root 7803 0.0 0.0 51096 668 pts/0 S+ 12:11 0:00 grep open
> >>>>[root at vortex3l-72 158]# strace -p7975
> >>>>Process 7975 attached - interrupt to quit
> >>>>restart_syscall(0x7fbffff630, 0, 0, 0x7fbffff501, 0x130) = 0
> >>>>nanosleep({10, 0}, NULL) = 0
> >>>>nanosleep({10, 0}, NULL) = 0
> >>>>nanosleep({10, 0}, NULL) = 0
> >>>>nanosleep({10, 0}, NULL) = 0
> >>>>nanosleep({10, 0}, NULL) = 0
> >>>>nanosleep({10, 0}, NULL) = 0
> >>>>nanosleep({10, 0}, NULL) = 0
> >>>>nanosleep({10, 0}, NULL) = 0
> >>>>nanosleep({10, 0}, <unfinished ...>
> >>>>Process 7975 detached
> >>>>[root at vortex3l-72 158]# uptime
> >>>>12:13:02 up 4 days, 17:05, 5 users, load average: 0.00, 0.00, 0.00
> >>>>[root at vortex3l-72 158]# date
> >>>>Mon Nov 27 12:13:05 PST 2006
> >>>>[root at vortex3l-72 158]# tail /var/log/opensm1.log
> >>>>Nov 22 19:09:27 894295 [0000] -> OpenSM Rev:openib-1.2.1 OpenIB svn
> >>>>3673M
> >>>>
> >>>>Nov 22 19:09:28 164482 [9576BCA0] -> osm_report_notice: Reporting
> >>>>Generic Notice type:3 num:66 from LID:0x0000
> >>>>GID:0xfe80000000000000,0x0000000000000000
> >>>>Nov 22 19:09:28 164560 [9576BCA0] -> osm_report_notice: Reporting
> >>>>Generic Notice type:3 num:66 from LID:0x0000
> >>>>GID:0xfe80000000000000,0x0000000000000000
> >>>>Nov 22 19:09:28 164608 [9576BCA0] -> osm_vendor_bind: Binding to port
> >>>>0x5045014b1a0001
> >>>>Nov 22 19:09:28 167508 [9576BCA0] -> osm_vendor_bind: Binding to port
> >>>>0x5045014b1a0001
> >>>>Nov 22 19:09:28 177285 [0000] -> Entering STANDBY state
> >>>>
> >>>>Nov 22 19:09:28 442435 [0000] -> Entering MASTER state
> >>>>
> >>>>[root at vortex3l-72 158]# tail /var/log/opensm2.log
> >>>> 00 00 00 00 00 00 00 00 00 00 00 00
> >>>>00 00 00 00
> >>>>
> >>>>Nov 27 12:10:32 146325 [41401960] -> osm_report_notice: Reporting
> >>>>Generic Notice type:3 num:65 from LID:0x0001
> >>>>GID:0xfe80000000000000,0x005045014b1a0002
> >>>>Nov 27 12:10:32 146343 [41401960] -> __match_notice_to_inf_rec:
> >>>>Cannot find destination port with LID:0x0002
> >>>>Nov 27 12:10:32 146358 [41401960] -> __match_notice_to_inf_rec:
> >>>>Cannot find destination port with LID:0x0003
> >>>>Nov 27 12:10:32 146373 [41401960] -> __match_notice_to_inf_rec:
> >>>>Cannot find destination port with LID:0x0004
> >>>>Nov 27 12:10:32 146382 [41401960] -> Removed port with
> >>>>GUID:0x0002c9020020f5ae LID range [0x6,0x6] of node:sqaathlon03 HCA-1
> >>>>Nov 27 12:10:32 146400 [41401960] -> osm_drop_mgr_process: ERR 0108:
> >>>>Unknown remote side for node 0x0002c9010d26bae0 port 11. Adding to
> >>>>light sweep sampling list
> >>>>Nov 27 12:10:32 146420 [41401960] -> Directed Path Dump of 1 hop path:
> >>>> Path = [0][2]
> >>>>
> >>>>
> >>>>
> >
> >
> >
More information about the general
mailing list