[Users] ibsim updates routing tables?

Albert Chu chu11 at llnl.gov
Fri Oct 25 13:26:12 PDT 2013


I now see your earlier reply.  You realized your mistake that you were
disabling all the links on the switch, which effectively lead to
disabling all the nodes.

I think you typoed your second attempt.  It should be:

clear "S-0002c90300684e30"[2]

Al

On Fri, 2013-10-25 at 12:45 -0600, Robert LeBlanc wrote:
> But, I'm trying to route from one HCA port to another HCA port (not a
> switch). I'm taking down a switch link in which there is another path
> available between the HCA ports. Do the port GUIDs change in this type
> of event (I don't believe that is the case).
> 
> 
> When I take this switch port down I would expect the output to be:
> From {0x0002c90300ebbb60}[2]
> [2] -> {0x0002c90300684e30}[19]
> [1] -> {0x0002c90200431fb8}[10]
> [33(or 34)] -> {0x001397010a000044}[8(or 9)]
> [35] -> {0x0013970301001f4c}[1]
> To {0x0013970301001f4b}[1]
> 
> 
> I understand if I disconnect the HCA port then I should not be able to
> connect, but taking down a switch port should cause ibsim/opensm to
> reroute around the downed link. Again, please let me know if I'm
> missing something because I'm still learning this.
> 
> 
> Thank,
> 
> 
> 
> 
> 
> Robert LeBlanc
> OIT Infrastructure & Virtualization Engineer
> Brigham Young University
> 
> 
> On Fri, Oct 25, 2013 at 12:35 PM, Albert Chu <chu11 at llnl.gov> wrote:
>         Hi Robert,
>         
>         > I'm trying to test routing in ibsim, but it doesn't seem to
>         update the
>         > routing tables in the simulated switches. If I take a link
>         down using
>         > the clear command in ibsim, I see opensm saying that it is
>         updating
>         > the routing tables and that it completes, but I can't
>         ibtracert to the
>         > LID who's path was taken down.
>         
>         
>         I have a feeling you might be confusing ibtracert's behavior
>         w/ the
>         typical behavior of traceroute.
>         
>         When you disable the link below, you are effectively taking
>         node(s) out
>         of your fabric.  OpenSM will see that the node(s) disappeared
>         and will
>         re-route the fabric.  Those nodes are now eliminated from all
>         of the
>         routing tables.  So when you ibtracert that node, ibtracert
>         effectively
>         states it can't do a traceroute b/c the node/route doesn't
>         exist.
>         
>         This is different than traceroute, which output the network
>         hops as far
>         as it can go, even if the end destination is down.
>         
>         Al
>         
>         On Fri, 2013-10-25 at 12:22 -0600, Robert LeBlanc wrote:
>         > I just realized that in this example I'm shutting down the
>         entire
>         > switch that the host is connected to instead of the uplink
>         port. If I
>         > issue 'clear "S-0002c90300684e30" 2"', I get the same
>         result. Port 1
>         > and 2 are both uplink ports to different leaf IB switches in
>         a fat
>         > tree scheme.
>         >
>         >
>         >
>         > Robert LeBlanc
>         > OIT Infrastructure & Virtualization Engineer
>         > Brigham Young University
>         >
>         >
>         > On Fri, Oct 25, 2013 at 11:19 AM, Robert LeBlanc
>         > <robert_leblanc at byu.edu> wrote:
>         >         Here is the details of what I'm doing:
>         >
>         >
>         >         In one terminal, I run ibsim:
>         >         root at rleblanc-pc:/home/leblanc/Downloads# ibsim -s
>         ibtopo
>         >         parsing: ibtopo
>         >         ibtopo: parsed 928 lines
>         >         ########################
>         >         Network simulator ready.
>         >         MaxNetNodes    = 2048
>         >         MaxNetSwitches = 256
>         >         MaxNetPorts    = 13312
>         >         MaxLinearCap   = 30720
>         >         MaxMcastCap    = 1024
>         >         sim> ibwarn: [2278] process_packet: no one to handle
>         pkt:
>         >         class 0x81, attr 0xff90
>         >         ibwarn: [2278] process_packet: no one to handle pkt:
>         class
>         >         0x81, attr 0xff90
>         >         ...snip out tons of these messages...
>         >         ibwarn: [2278] process_packet: no one to handle pkt:
>         class
>         >         0x81, attr 0xff90
>         >         clear "S-0002c90300684e30"
>         >         sim> ibwarn: [2278] process_packet: got trap repress
>         - drop
>         >         ibwarn: [2278] process_packet: got trap repress -
>         drop
>         >         ibwarn: [2278] process_packet: no one to handle pkt:
>         class
>         >         0x81, attr 0xff90
>         >         ...snip out tons of these messages...
>         >         ibwarn: [2278] process_packet: no one to handle pkt:
>         class
>         >         0x81, attr 0xff90
>         >         relink "0002c90300684e30"
>         >         # nodeid "0002c90300684e30" (0002c90300684e30) not
>         found
>         >         sim> relink "S-0002c90300684e30"
>         >         sim> ibwarn: [2278] process_packet: got trap repress
>         - drop
>         >         ibwarn: [2278] process_packet: got trap repress -
>         drop
>         >         ibwarn: [2278] process_packet: no one to handle pkt:
>         class
>         >         0x81, attr 0xff90
>         >         ...snip out tons of these messages...
>         >         ibwarn: [2278] process_packet: no one to handle pkt:
>         class
>         >         0x81, attr 0xff90
>         >         quit
>         >         Exiting network simulator.
>         >         root at rleblanc-pc:/home/leblanc/Downloads#
>         >
>         >
>         >         Then in another terminal I run opensm:
>         >
>         root at rleblanc-pc:/home/leblanc/Documents/Work/Scripts/ib#
>         >         SIM_HOST="H-0013970201000978" OSM_TMP_DIR=./
>         OSM_CACHE_DIR=./
>         >         LD_PRELOAD=/usr/lib/umad2sim/libumad2sim.so opensm
>         -e -v
>         >         -f ./osm.log
>         >         -------------------------------------------------
>         >         OpenSM 3.3.15
>         >         Command Line Arguments:
>         >          Creating new log file
>         >          Verbose option -v (log flags = 0x7)
>         >          Log File: ./osm.log
>         >         -------------------------------------------------
>         >         OpenSM 3.3.15
>         >
>         >
>         >         Entering DISCOVERING state
>         >
>         >
>         >         Using default GUID 0x13970201000979
>         >         Entering MASTER state
>         >
>         >
>         >
>         >
>         >
>         =======================================================================================================
>         >         Vendor      : Ty : #  : Sta : LID  : LMC : MTU  :
>         LWA : LSA  :
>         >         Port GUID        : Neighbor Port (Port #)
>         >         Unknown     : CA : 01 : ACT : 0003 :  0  : 2048 :
>         4x  : 2.5  :
>         >         f04da29097793001 : 0002c9020042ea60 (12)
>         >         Unknown     : CA : 02 : ACT : 0007 :  0  : 2048 :
>         4x  : 2.5  :
>         >         f04da29097793002 : 0002c902004294e0 (12)
>         >
>         ------------------------------------------------------------------------------------------------------
>         >         Mellanox    : SW : 00 :     : 0002 :  0  :      :
>           :      :
>         >         0002c90300879a00 :
>         >         Mellanox    : SW : 01 : ACT :      :     : 2048 :
>         4x  : 2.5  :
>         >         0002c90300879a00 : 0002c90200431f90 (08)
>         >         Mellanox    : SW : 02 : ACT :      :     : 2048 :
>         4x  : 2.5  :
>         >         0002c90300879a00 : 0002c90200431f58 (09)
>         >         Mellanox    : SW : 03 : DWN :      :     : ???
>          : ??? : Ext  :
>         >         0002c90300879a00 :
>         >         ...snip...
>         >
>         >
>         >         Then in a third console I run ibtracert:
>         >         leblanc at rleblanc-pc:~/Documents/Work/Scripts/ib$
>         >
>         LD_PRELOAD=/usr/lib/umad2sim/libumad2sim.so /usr/sbin/ibtracert -G -n 0x0002c90300ebbb62 0x0013970301001f4c 2> /dev/null
>         >         From {0x0002c90300ebbb60}[2]
>         >         [2] -> {0x0002c90300684e30}[19]
>         >         [2] -> {0x0002c90200431eb8}[10]
>         >         [33] -> {0x001397010a000044}[10]
>         >         [35] -> {0x0013970301001f4c}[1]
>         >         To {0x0013970301001f4b}[1]
>         >         leblanc at rleblanc-pc:~/Documents/Work/Scripts/ib$
>         >
>         LD_PRELOAD=/usr/lib/umad2sim/libumad2sim.so /usr/sbin/ibtracert -G -n 0x0002c90300ebbb62 0x0013970301001f4c 2> /dev/null
>         >         /usr/sbin/ibtracert: iberror: failed: can't resolve
>         source
>         >         port 0x0002c90300ebbb62
>         >         leblanc at rleblanc-pc:~/Documents/Work/Scripts/ib$
>         >
>         LD_PRELOAD=/usr/lib/umad2sim/libumad2sim.so /usr/sbin/ibtracert -G -n 0x0002c90300ebbb62 0x0013970301001f4c 2> /dev/null
>         >         From {0x0002c90300ebbb60}[2]
>         >         [2] -> {0x0002c90300684e30}[19]
>         >         [2] -> {0x0002c90200431eb8}[10]
>         >         [33] -> {0x001397010a000044}[10]
>         >         [35] -> {0x0013970301001f4c}[1]
>         >         To {0x0013970301001f4b}[1]
>         >         leblanc at rleblanc-pc:~/Documents/Work/Scripts/ib$
>         >
>         >
>         >         I'm attaching our topo file that we are using and
>         the opensm
>         >         logs (you should be able to replicate the problem
>         given this
>         >         information or tell me what I'm doing wrong).
>         >
>         >
>         >         Thanks,
>         >
>         >
>         >
>         >         Robert LeBlanc
>         >         OIT Infrastructure & Virtualization Engineer
>         >         Brigham Young University
>         >
>         >
>         >
>         >         On Tue, Oct 22, 2013 at 10:55 PM, Hal Rosenstock
>         >         <hal.rosenstock at gmail.com> wrote:
>         >                 ibsim just simulates the network (topology,
>         SMAs, and
>         >                 PMAs). OpenSM configured the subnet
>         including the
>         >                 routing (LFTs and MFTs) based on the routing
>         >                 algorithm. It is possible in a topology that
>         multiple
>         >                 routing algorithms yield the same routes.
>         More
>         >                 specifics would be needed to comment
>         "deeper"...
>         >
>         >                 -- Hal
>         >
>         >
>         >                 On Tue, Oct 22, 2013 at 6:38 PM, Robert
>         LeBlanc
>         >                 <robert_leblanc at byu.edu> wrote:
>         >
>         >                         I'm trying to test routing in ibsim,
>         but it
>         >                         doesn't seem to update the routing
>         tables in
>         >                         the simulated switches. If I take a
>         link down
>         >                         using the clear command in ibsim, I
>         see opensm
>         >                         saying that it is updating the
>         routing tables
>         >                         and that it completes, but I can't
>         ibtracert
>         >                         to the LID who's path was taken
>         down.
>         >
>         >
>         >                         Should ibsim and opensm be
>         reconfiguring
>         >                         routing in the simulated
>         environment? No
>         >                         matter which routing protocol I
>         select in
>         >                         opensm, the routes are always the
>         same, even
>         >                         having opensm re-LID the entire
>         fabric doesn't
>         >                         help. Any help would be appreciated.
>         >
>         >
>         >                         Output from opensm:
>         >
>         >
>         >
>         ******************************************************************
>         >                         ***** LID ASSIGNMENT COMPLETE -
>         STARTING
>         >                         SWITCH TABLE CONFIG *****
>         >
>         ******************************************************************
>         >
>         >
>         >
>         >
>         >                         Oct 22 16:27:20 330198 [8437A700]
>         0x04 ->
>         >                         osm_ucast_mgr_build_lid_matrices:
>         Starting
>         >                         switches' Min Hop Table Assignment
>         >                         Oct 22 16:27:20 330954 [8437A700]
>         0x02 ->
>         >                         osm_ucast_mgr_process: minhop tables
>         >                         configured on all switches
>         >                         Oct 22 16:27:20 331191 [8437A700]
>         0x04 ->
>         >                         do_sweep:
>         >
>         >
>         >
>         >
>         >
>         ******************************************************************
>         >                         **************** SWITCHES CONFIGURED
>         FOR
>         >                         UNICAST *****************
>         >
>         ******************************************************************
>         >
>         >
>         >
>         >
>         >                         Thanks,
>         >
>         >
>         >                         Robert LeBlanc
>         >                         OIT Infrastructure & Virtualization
>         Engineer
>         >                         Brigham Young University
>         >
>         >
>         >
>         _______________________________________________
>         >                         Users mailing list
>         >                         Users at lists.openfabrics.org
>         >
>         http://lists.openfabrics.org/cgi-bin/mailman/listinfo/users
>         >
>         >
>         >
>         >
>         >
>         >
>         >
>         > _______________________________________________
>         > Users mailing list
>         > Users at lists.openfabrics.org
>         > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/users
>         
>         --
>         Albert Chu
>         chu11 at llnl.gov
>         Computer Scientist
>         High Performance Systems Division
>         Lawrence Livermore National Laboratory
>         
>         
> 
> 
-- 
Albert Chu
chu11 at llnl.gov
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory





More information about the Users mailing list