[openib-general] Unreliable OpemSM failover

Venkatesh Babu venkatesh.babu at 3leafnetworks.com
Fri Dec 8 15:44:38 PST 2006


 I have 3 nodes and 2 IB switches. Port 1 of all nodes connected to 
switch 1 and Port2 of all nodes connected to switch 2. So each switch 
creates its own subnet and hence I have two instances of OpenSM for each 
port. I have two OpenSMs running with priority 1 on node1 and two 
OpenSM's running with priority 13 on node 2. Node 3 doesn't have any 
OpenSM's but just a OFED kernel modules. I reboot the node 2 every 
10minutes. Since it has the highest priority, every time it boots up it 
grabs the mastership from the node 1. It works most of the time, except 
when this problem occurs.

 When this problem occurs, node 3 shows the old/stale SMLID information. 
But if you reload the ofed drivers or reboot the node to get the new LID 
assignment it shows SMLID as 0. Even though Node 1's SMLID and port LID 
are same, it was not completely asserted the mastership. See the log 
messages below -

[root ~]# ibv_devinfo
...
                port:   1
                        state:                  PORT_INIT (2)
                        max_mtu:                2048 (4)
                        active_mtu:             512 (2)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00

                port:   2
                        state:                  PORT_INIT (2)
                        max_mtu:                2048 (4)
                        active_mtu:             512 (2)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00



 The strace output is shown below -
[root~]# strace -p 7518
Process 7518 attached - interrupt to quit
restart_syscall(0x7fbffff630, 0, 0, 0x7fbffff501, 0x335d) = 0
nanosleep({10, 0}, NULL)                = 0
nanosleep({10, 0}, NULL)                = 0
nanosleep({10, 0}, NULL)                = 0
nanosleep({10, 0}, NULL)                = 0
nanosleep({10, 0}, NULL)                = 0
nanosleep({10, 0}, NULL)                = 0
nanosleep({10, 0}, NULL)                = 0
nanosleep({10, 0}, NULL)                = 0
nanosleep({10, 0}, NULL)                = 0

 The GDB output is shown below -
[root ~]# gdb /usr/bin/opensm 7518
GNU gdb Red Hat Linux (6.3.0.0-1.63rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain 
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...
(no debugging symbols found)
Using host libthread_db library "/lib64/tls/libthread_db.so.1".

Attaching to program: /usr/bin/opensm, process 7518
Reading symbols from /usr/lib/libibumad.so.1...done.
Loaded symbols for /usr/lib/libibumad.so.1
Reading symbols from /usr/lib/libopensm.so.1...done.
Loaded symbols for /usr/lib/libopensm.so.1
Reading symbols from /usr/lib/libosmcomp.so.1...done.
Loaded symbols for /usr/lib/libosmcomp.so.1
Reading symbols from /lib64/tls/libpthread.so.0...done.
[Thread debugging using libthread_db enabled]
[New Thread 182896213152 (LWP 7518)]
[New Thread 1136679264 (LWP 7544)]
[New Thread 1126189408 (LWP 7543)]
[New Thread 1115699552 (LWP 7542)]
[New Thread 1105209696 (LWP 7541)]
[New Thread 1094719840 (LWP 7540)]
[New Thread 1084229984 (LWP 7534)]
Loaded symbols for /lib64/tls/libpthread.so.0
Reading symbols from /usr/lib/libosmvendor.so.1...done.
Loaded symbols for /usr/lib/libosmvendor.so.1
Reading symbols from /usr/lib/libibcommon.so.1...done.
Loaded symbols for /usr/lib/libibcommon.so.1
Reading symbols from /lib64/tls/libc.so.6...done.
Loaded symbols for /lib64/tls/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
0x000000316038ed65 in __nanosleep_nocancel () from /lib64/tls/libc.so.6
(gdb) bt
#0  0x000000316038ed65 in __nanosleep_nocancel () from /lib64/tls/libc.so.6
#1  0x00000031603bf368 in usleep () from /lib64/tls/libc.so.6
#2  0x000000316080df32 in cl_thread_suspend (pause_ms=10000) at 
cl_thread.c:125
#3  0x000000000040584e in main ()
(gdb) print osm_hup_flag
$1 = 0
(gdb)


Following is the log output. It is entring to MASTER state. But it 
doesn't show"SUBNET UP" event. It gets stuck.

[root ~]#  tail /var/log/opensm1.log
Dec 04 15:59:35 573040 [0000] -> OpenSM Rev:openib-1.2.1 OpenIB svn 3726M

Dec 04 15:59:35 783462 [9576BCA0] -> osm_report_notice: Reporting 
Generic Notice type:3 num:66 from LID:0x0000 
GID:0xfe80000000000000,0x0000000000000000
Dec 04 15:59:35 783541 [9576BCA0] -> osm_report_notice: Reporting 
Generic Notice type:3 num:66 from LID:0x0000 
GID:0xfe80000000000000,0x0000000000000000
Dec 04 15:59:35 783589 [9576BCA0] -> osm_vendor_bind: Binding to port 
0x5045014b1a0001
Dec 04 15:59:35 787924 [9576BCA0] -> osm_vendor_bind: Binding to port 
0x5045014b1a0001
Dec 04 15:59:35 800404 [0000] -> Entering STANDBY state

Dec 04 15:59:36 053784 [0000] -> Entering MASTER state



Hal Rosenstock wrote:

>On Fri, 2006-12-08 at 17:12, Venkatesh Babu wrote:
>  
>
>>I have got the same problem with OFED 1.1 stack also, but the frequency 
>>is less. I had to try 120 fail overs (by rebooting the highest priority 
>>OpenSM server) before getting into this problem.
>>    
>>
>
>If I understand you correctly, you reboot the master SM and the standby
>does not takeover (become master). Is that correct ?
>
>Is this with 2 SMs or more ?
>
>  
>
>>At this state OpenSM doesn't update anything to the log files; 
>>doesn't assign the LIDs to the other nodes; doesn't respond 
>>to the multi cast join operations. Even another OpenSM is 
>>started on another node with higher priority it can 
>>not become the master. The only way to recover from this is by killing 
>>the stuck OpenSM.
>>    
>>
>
>What SMLID do the nodes in the subnet point to ?
>
>Can you determine where is it stuck ? Sounds like it could be in some
>tight loop. Can you build with gdb and attach when this occurs to see ?
>
>-- Hal
>
>  
>
>> VBabu
>>
>>Hal Rosenstock wrote:
>>
>>    
>>
>>>I don't see any explicit changes to the SM state machine which would
>>>affect this but as I have mentioned before there are many bug fixes in
>>>OFED 1.1. I can't conclusively state whether this would fix the issue
>>>you see but would be in a much better position to try to figure this
>>>out.
>>>
>>>-- Hal
>>>
>>> 
>>>
>>>      
>>>
>>>>Hi
>>>>
>>>>  I have topology of two switches and a bunch of nodes, with each 
>>>>node having 2port HCAs. Port1 of every node connects to switch1 and 
>>>>Port2 of every node connects to switch2. So Port1 and Port2 are in 
>>>>different subnets. So I am running one OpenSM (from OFED 1.0) for 
>>>>each port on one node designated as a server. To guard against that 
>>>>server going down I have another server node to run the OpenSM in 
>>>>"standby" mode for each port. I will adjust the priorities such that 
>>>>first server always has "master" OpenSM and second server has 
>>>>"standby" OpenSM.
>>>>
>>>>   When the first server rebooted, "standby" OpenSM should takeover 
>>>>the mastership. It usually works fine but sometimes it is failing to 
>>>>takeover. In the following example OpenSM for Port1 failed to 
>>>>takeover, but OpenSM for Port2 took over and became "master". The 
>>>>OpenSM for Port1 seems be stuck in some weired state (strace shows 
>>>>that it is sleeping). It is no longer assigning LIDs to the rest of 
>>>>the nodes in the subnet and not responding to the broadcast joins. 
>>>>The log file shows nothing from past 4 days. I have the complete log 
>>>>files if needed.
>>>>
>>>>   Is this a known problem and fixed in OFED 1.1 ?
>>>>
>>>>[root at vortex3l-72 158]# ibv_devinfo
>>>>hca_id: mthca0
>>>>       fw_ver:                         5.1.400
>>>>       node_guid:                      0050:4501:4b1a:0000
>>>>       sys_image_guid:                 0050:4501:4b1a:0003
>>>>       vendor_id:                      0x02c9
>>>>       vendor_part_id:                 25218
>>>>       hw_ver:                         0xA0
>>>>       board_id:                       ARM0020000001
>>>>       phys_port_cnt:                  2
>>>>               port:   1
>>>>                       state:                  PORT_ACTIVE (4)
>>>>                       max_mtu:                2048 (4)
>>>>                       active_mtu:             2048 (4)
>>>>                       sm_lid:                 7
>>>>                       port_lid:               1
>>>>                       port_lmc:               0x00
>>>>
>>>>               port:   2
>>>>                       state:                  PORT_ACTIVE (4)
>>>>                       max_mtu:                2048 (4)
>>>>                       active_mtu:             2048 (4)
>>>>                       sm_lid:                 1
>>>>                       port_lid:               1
>>>>                       port_lmc:               0x00
>>>>
>>>>[root at vortex3l-72 158]# ps -aux | grep open
>>>>Warning: bad syntax, perhaps a bogus '-'? See 
>>>>/usr/share/doc/procps-3.2.3/FAQ
>>>>root      7988  0.0  0.0 92784 1672 ?        Sl   Nov22   0:06 
>>>>/usr/bin/opensm -g 0x005045014b1a0002 -p 13 -s 10 -u -f 
>>>>/var/log/opensm2.log
>>>>root      7975  0.0  0.0 92784 1572 ?        Sl   Nov22   0:06 
>>>>/usr/bin/opensm -g 0x005045014b1a0001 -p 13 -s 10 -u -f 
>>>>/var/log/opensm1.log
>>>>root      7803  0.0  0.0 51096  668 pts/0    S+   12:11   0:00 grep open
>>>>[root at vortex3l-72 158]# strace -p7975
>>>>Process 7975 attached - interrupt to quit
>>>>restart_syscall(0x7fbffff630, 0, 0, 0x7fbffff501, 0x130) = 0
>>>>nanosleep({10, 0}, NULL)                = 0
>>>>nanosleep({10, 0}, NULL)                = 0
>>>>nanosleep({10, 0}, NULL)                = 0
>>>>nanosleep({10, 0}, NULL)                = 0
>>>>nanosleep({10, 0}, NULL)                = 0
>>>>nanosleep({10, 0}, NULL)                = 0
>>>>nanosleep({10, 0}, NULL)                = 0
>>>>nanosleep({10, 0}, NULL)                = 0
>>>>nanosleep({10, 0},  <unfinished ...>
>>>>Process 7975 detached
>>>>[root at vortex3l-72 158]# uptime
>>>>12:13:02 up 4 days, 17:05,  5 users,  load average: 0.00, 0.00, 0.00
>>>>[root at vortex3l-72 158]# date
>>>>Mon Nov 27 12:13:05 PST 2006
>>>>[root at vortex3l-72 158]#  tail /var/log/opensm1.log
>>>>Nov 22 19:09:27 894295 [0000] -> OpenSM Rev:openib-1.2.1 OpenIB svn 
>>>>3673M
>>>>
>>>>Nov 22 19:09:28 164482 [9576BCA0] -> osm_report_notice: Reporting 
>>>>Generic Notice type:3 num:66 from LID:0x0000 
>>>>GID:0xfe80000000000000,0x0000000000000000
>>>>Nov 22 19:09:28 164560 [9576BCA0] -> osm_report_notice: Reporting 
>>>>Generic Notice type:3 num:66 from LID:0x0000 
>>>>GID:0xfe80000000000000,0x0000000000000000
>>>>Nov 22 19:09:28 164608 [9576BCA0] -> osm_vendor_bind: Binding to port 
>>>>0x5045014b1a0001
>>>>Nov 22 19:09:28 167508 [9576BCA0] -> osm_vendor_bind: Binding to port 
>>>>0x5045014b1a0001
>>>>Nov 22 19:09:28 177285 [0000] -> Entering STANDBY state
>>>>
>>>>Nov 22 19:09:28 442435 [0000] -> Entering MASTER state
>>>>
>>>>[root at vortex3l-72 158]#  tail /var/log/opensm2.log
>>>>                               00 00 00 00 00 00 00 00   00 00 00 00 
>>>>00 00 00 00
>>>>
>>>>Nov 27 12:10:32 146325 [41401960] -> osm_report_notice: Reporting 
>>>>Generic Notice type:3 num:65 from LID:0x0001 
>>>>GID:0xfe80000000000000,0x005045014b1a0002
>>>>Nov 27 12:10:32 146343 [41401960] -> __match_notice_to_inf_rec: 
>>>>Cannot find destination port with LID:0x0002
>>>>Nov 27 12:10:32 146358 [41401960] -> __match_notice_to_inf_rec: 
>>>>Cannot find destination port with LID:0x0003
>>>>Nov 27 12:10:32 146373 [41401960] -> __match_notice_to_inf_rec: 
>>>>Cannot find destination port with LID:0x0004
>>>>Nov 27 12:10:32 146382 [41401960] -> Removed port with 
>>>>GUID:0x0002c9020020f5ae LID range [0x6,0x6] of node:sqaathlon03 HCA-1
>>>>Nov 27 12:10:32 146400 [41401960] -> osm_drop_mgr_process: ERR 0108: 
>>>>Unknown remote side for node 0x0002c9010d26bae0 port 11. Adding to 
>>>>light sweep sampling list
>>>>Nov 27 12:10:32 146420 [41401960] -> Directed Path Dump of 1 hop path:
>>>>                               Path = [0][2]
>>>>
>>>>        
>>>>
>
>  
>




More information about the general mailing list