[Users] ibsim updates routing tables?

Robert LeBlanc robert_leblanc at byu.edu
Fri Oct 25 13:30:01 PDT 2013


Al, you pegged it! I seriously need to undo 10 years of reading Linux
documentation to use ibsim, the command syntax needs to be VERY literal.

leblanc at rleblanc-pc:~/Documents/Work/Scripts/ib$
LD_PRELOAD=/usr/lib/umad2sim/libumad2sim.so /usr/sbin/ibtracert -G -n
0x0002c90300ebbb62 0x0013970301001f4c 2> /dev/null
>From {0x0002c90300ebbb60}[2]
[2] -> {0x0002c90300684e30}[19]
[1] -> {0x0002c90200431fb8}[10]
[33] -> {0x001397010a000044}[8]
[35] -> {0x0013970301001f4c}[1]
To {0x0013970301001f4b}[1]
leblanc at rleblanc-pc:~/Documents/Work/Scripts/ib$

Now to get this program back on track! Thank you for being patient.


Robert LeBlanc
OIT Infrastructure & Virtualization Engineer
Brigham Young University


On Fri, Oct 25, 2013 at 2:26 PM, Albert Chu <chu11 at llnl.gov> wrote:

> I now see your earlier reply.  You realized your mistake that you were
> disabling all the links on the switch, which effectively lead to
> disabling all the nodes.
>
> I think you typoed your second attempt.  It should be:
>
> clear "S-0002c90300684e30"[2]
>
> Al
>
> On Fri, 2013-10-25 at 12:45 -0600, Robert LeBlanc wrote:
> > But, I'm trying to route from one HCA port to another HCA port (not a
> > switch). I'm taking down a switch link in which there is another path
> > available between the HCA ports. Do the port GUIDs change in this type
> > of event (I don't believe that is the case).
> >
> >
> > When I take this switch port down I would expect the output to be:
> > From {0x0002c90300ebbb60}[2]
> > [2] -> {0x0002c90300684e30}[19]
> > [1] -> {0x0002c90200431fb8}[10]
> > [33(or 34)] -> {0x001397010a000044}[8(or 9)]
> > [35] -> {0x0013970301001f4c}[1]
> > To {0x0013970301001f4b}[1]
> >
> >
> > I understand if I disconnect the HCA port then I should not be able to
> > connect, but taking down a switch port should cause ibsim/opensm to
> > reroute around the downed link. Again, please let me know if I'm
> > missing something because I'm still learning this.
> >
> >
> > Thank,
> >
> >
> >
> >
> >
> > Robert LeBlanc
> > OIT Infrastructure & Virtualization Engineer
> > Brigham Young University
> >
> >
> > On Fri, Oct 25, 2013 at 12:35 PM, Albert Chu <chu11 at llnl.gov> wrote:
> >         Hi Robert,
> >
> >         > I'm trying to test routing in ibsim, but it doesn't seem to
> >         update the
> >         > routing tables in the simulated switches. If I take a link
> >         down using
> >         > the clear command in ibsim, I see opensm saying that it is
> >         updating
> >         > the routing tables and that it completes, but I can't
> >         ibtracert to the
> >         > LID who's path was taken down.
> >
> >
> >         I have a feeling you might be confusing ibtracert's behavior
> >         w/ the
> >         typical behavior of traceroute.
> >
> >         When you disable the link below, you are effectively taking
> >         node(s) out
> >         of your fabric.  OpenSM will see that the node(s) disappeared
> >         and will
> >         re-route the fabric.  Those nodes are now eliminated from all
> >         of the
> >         routing tables.  So when you ibtracert that node, ibtracert
> >         effectively
> >         states it can't do a traceroute b/c the node/route doesn't
> >         exist.
> >
> >         This is different than traceroute, which output the network
> >         hops as far
> >         as it can go, even if the end destination is down.
> >
> >         Al
> >
> >         On Fri, 2013-10-25 at 12:22 -0600, Robert LeBlanc wrote:
> >         > I just realized that in this example I'm shutting down the
> >         entire
> >         > switch that the host is connected to instead of the uplink
> >         port. If I
> >         > issue 'clear "S-0002c90300684e30" 2"', I get the same
> >         result. Port 1
> >         > and 2 are both uplink ports to different leaf IB switches in
> >         a fat
> >         > tree scheme.
> >         >
> >         >
> >         >
> >         > Robert LeBlanc
> >         > OIT Infrastructure & Virtualization Engineer
> >         > Brigham Young University
> >         >
> >         >
> >         > On Fri, Oct 25, 2013 at 11:19 AM, Robert LeBlanc
> >         > <robert_leblanc at byu.edu> wrote:
> >         >         Here is the details of what I'm doing:
> >         >
> >         >
> >         >         In one terminal, I run ibsim:
> >         >         root at rleblanc-pc:/home/leblanc/Downloads# ibsim -s
> >         ibtopo
> >         >         parsing: ibtopo
> >         >         ibtopo: parsed 928 lines
> >         >         ########################
> >         >         Network simulator ready.
> >         >         MaxNetNodes    = 2048
> >         >         MaxNetSwitches = 256
> >         >         MaxNetPorts    = 13312
> >         >         MaxLinearCap   = 30720
> >         >         MaxMcastCap    = 1024
> >         >         sim> ibwarn: [2278] process_packet: no one to handle
> >         pkt:
> >         >         class 0x81, attr 0xff90
> >         >         ibwarn: [2278] process_packet: no one to handle pkt:
> >         class
> >         >         0x81, attr 0xff90
> >         >         ...snip out tons of these messages...
> >         >         ibwarn: [2278] process_packet: no one to handle pkt:
> >         class
> >         >         0x81, attr 0xff90
> >         >         clear "S-0002c90300684e30"
> >         >         sim> ibwarn: [2278] process_packet: got trap repress
> >         - drop
> >         >         ibwarn: [2278] process_packet: got trap repress -
> >         drop
> >         >         ibwarn: [2278] process_packet: no one to handle pkt:
> >         class
> >         >         0x81, attr 0xff90
> >         >         ...snip out tons of these messages...
> >         >         ibwarn: [2278] process_packet: no one to handle pkt:
> >         class
> >         >         0x81, attr 0xff90
> >         >         relink "0002c90300684e30"
> >         >         # nodeid "0002c90300684e30" (0002c90300684e30) not
> >         found
> >         >         sim> relink "S-0002c90300684e30"
> >         >         sim> ibwarn: [2278] process_packet: got trap repress
> >         - drop
> >         >         ibwarn: [2278] process_packet: got trap repress -
> >         drop
> >         >         ibwarn: [2278] process_packet: no one to handle pkt:
> >         class
> >         >         0x81, attr 0xff90
> >         >         ...snip out tons of these messages...
> >         >         ibwarn: [2278] process_packet: no one to handle pkt:
> >         class
> >         >         0x81, attr 0xff90
> >         >         quit
> >         >         Exiting network simulator.
> >         >         root at rleblanc-pc:/home/leblanc/Downloads#
> >         >
> >         >
> >         >         Then in another terminal I run opensm:
> >         >
> >         root at rleblanc-pc:/home/leblanc/Documents/Work/Scripts/ib#
> >         >         SIM_HOST="H-0013970201000978" OSM_TMP_DIR=./
> >         OSM_CACHE_DIR=./
> >         >         LD_PRELOAD=/usr/lib/umad2sim/libumad2sim.so opensm
> >         -e -v
> >         >         -f ./osm.log
> >         >         -------------------------------------------------
> >         >         OpenSM 3.3.15
> >         >         Command Line Arguments:
> >         >          Creating new log file
> >         >          Verbose option -v (log flags = 0x7)
> >         >          Log File: ./osm.log
> >         >         -------------------------------------------------
> >         >         OpenSM 3.3.15
> >         >
> >         >
> >         >         Entering DISCOVERING state
> >         >
> >         >
> >         >         Using default GUID 0x13970201000979
> >         >         Entering MASTER state
> >         >
> >         >
> >         >
> >         >
> >         >
> >
> =======================================================================================================
> >         >         Vendor      : Ty : #  : Sta : LID  : LMC : MTU  :
> >         LWA : LSA  :
> >         >         Port GUID        : Neighbor Port (Port #)
> >         >         Unknown     : CA : 01 : ACT : 0003 :  0  : 2048 :
> >         4x  : 2.5  :
> >         >         f04da29097793001 : 0002c9020042ea60 (12)
> >         >         Unknown     : CA : 02 : ACT : 0007 :  0  : 2048 :
> >         4x  : 2.5  :
> >         >         f04da29097793002 : 0002c902004294e0 (12)
> >         >
> >
> ------------------------------------------------------------------------------------------------------
> >         >         Mellanox    : SW : 00 :     : 0002 :  0  :      :
> >           :      :
> >         >         0002c90300879a00 :
> >         >         Mellanox    : SW : 01 : ACT :      :     : 2048 :
> >         4x  : 2.5  :
> >         >         0002c90300879a00 : 0002c90200431f90 (08)
> >         >         Mellanox    : SW : 02 : ACT :      :     : 2048 :
> >         4x  : 2.5  :
> >         >         0002c90300879a00 : 0002c90200431f58 (09)
> >         >         Mellanox    : SW : 03 : DWN :      :     : ???
> >          : ??? : Ext  :
> >         >         0002c90300879a00 :
> >         >         ...snip...
> >         >
> >         >
> >         >         Then in a third console I run ibtracert:
> >         >         leblanc at rleblanc-pc:~/Documents/Work/Scripts/ib$
> >         >
> >         LD_PRELOAD=/usr/lib/umad2sim/libumad2sim.so /usr/sbin/ibtracert
> -G -n 0x0002c90300ebbb62 0x0013970301001f4c 2> /dev/null
> >         >         From {0x0002c90300ebbb60}[2]
> >         >         [2] -> {0x0002c90300684e30}[19]
> >         >         [2] -> {0x0002c90200431eb8}[10]
> >         >         [33] -> {0x001397010a000044}[10]
> >         >         [35] -> {0x0013970301001f4c}[1]
> >         >         To {0x0013970301001f4b}[1]
> >         >         leblanc at rleblanc-pc:~/Documents/Work/Scripts/ib$
> >         >
> >         LD_PRELOAD=/usr/lib/umad2sim/libumad2sim.so /usr/sbin/ibtracert
> -G -n 0x0002c90300ebbb62 0x0013970301001f4c 2> /dev/null
> >         >         /usr/sbin/ibtracert: iberror: failed: can't resolve
> >         source
> >         >         port 0x0002c90300ebbb62
> >         >         leblanc at rleblanc-pc:~/Documents/Work/Scripts/ib$
> >         >
> >         LD_PRELOAD=/usr/lib/umad2sim/libumad2sim.so /usr/sbin/ibtracert
> -G -n 0x0002c90300ebbb62 0x0013970301001f4c 2> /dev/null
> >         >         From {0x0002c90300ebbb60}[2]
> >         >         [2] -> {0x0002c90300684e30}[19]
> >         >         [2] -> {0x0002c90200431eb8}[10]
> >         >         [33] -> {0x001397010a000044}[10]
> >         >         [35] -> {0x0013970301001f4c}[1]
> >         >         To {0x0013970301001f4b}[1]
> >         >         leblanc at rleblanc-pc:~/Documents/Work/Scripts/ib$
> >         >
> >         >
> >         >         I'm attaching our topo file that we are using and
> >         the opensm
> >         >         logs (you should be able to replicate the problem
> >         given this
> >         >         information or tell me what I'm doing wrong).
> >         >
> >         >
> >         >         Thanks,
> >         >
> >         >
> >         >
> >         >         Robert LeBlanc
> >         >         OIT Infrastructure & Virtualization Engineer
> >         >         Brigham Young University
> >         >
> >         >
> >         >
> >         >         On Tue, Oct 22, 2013 at 10:55 PM, Hal Rosenstock
> >         >         <hal.rosenstock at gmail.com> wrote:
> >         >                 ibsim just simulates the network (topology,
> >         SMAs, and
> >         >                 PMAs). OpenSM configured the subnet
> >         including the
> >         >                 routing (LFTs and MFTs) based on the routing
> >         >                 algorithm. It is possible in a topology that
> >         multiple
> >         >                 routing algorithms yield the same routes.
> >         More
> >         >                 specifics would be needed to comment
> >         "deeper"...
> >         >
> >         >                 -- Hal
> >         >
> >         >
> >         >                 On Tue, Oct 22, 2013 at 6:38 PM, Robert
> >         LeBlanc
> >         >                 <robert_leblanc at byu.edu> wrote:
> >         >
> >         >                         I'm trying to test routing in ibsim,
> >         but it
> >         >                         doesn't seem to update the routing
> >         tables in
> >         >                         the simulated switches. If I take a
> >         link down
> >         >                         using the clear command in ibsim, I
> >         see opensm
> >         >                         saying that it is updating the
> >         routing tables
> >         >                         and that it completes, but I can't
> >         ibtracert
> >         >                         to the LID who's path was taken
> >         down.
> >         >
> >         >
> >         >                         Should ibsim and opensm be
> >         reconfiguring
> >         >                         routing in the simulated
> >         environment? No
> >         >                         matter which routing protocol I
> >         select in
> >         >                         opensm, the routes are always the
> >         same, even
> >         >                         having opensm re-LID the entire
> >         fabric doesn't
> >         >                         help. Any help would be appreciated.
> >         >
> >         >
> >         >                         Output from opensm:
> >         >
> >         >
> >         >
> >
> ******************************************************************
> >         >                         ***** LID ASSIGNMENT COMPLETE -
> >         STARTING
> >         >                         SWITCH TABLE CONFIG *****
> >         >
> >
> ******************************************************************
> >         >
> >         >
> >         >
> >         >
> >         >                         Oct 22 16:27:20 330198 [8437A700]
> >         0x04 ->
> >         >                         osm_ucast_mgr_build_lid_matrices:
> >         Starting
> >         >                         switches' Min Hop Table Assignment
> >         >                         Oct 22 16:27:20 330954 [8437A700]
> >         0x02 ->
> >         >                         osm_ucast_mgr_process: minhop tables
> >         >                         configured on all switches
> >         >                         Oct 22 16:27:20 331191 [8437A700]
> >         0x04 ->
> >         >                         do_sweep:
> >         >
> >         >
> >         >
> >         >
> >         >
> >
> ******************************************************************
> >         >                         **************** SWITCHES CONFIGURED
> >         FOR
> >         >                         UNICAST *****************
> >         >
> >
> ******************************************************************
> >         >
> >         >
> >         >
> >         >
> >         >                         Thanks,
> >         >
> >         >
> >         >                         Robert LeBlanc
> >         >                         OIT Infrastructure & Virtualization
> >         Engineer
> >         >                         Brigham Young University
> >         >
> >         >
> >         >
> >         _______________________________________________
> >         >                         Users mailing list
> >         >                         Users at lists.openfabrics.org
> >         >
> >         http://lists.openfabrics.org/cgi-bin/mailman/listinfo/users
> >         >
> >         >
> >         >
> >         >
> >         >
> >         >
> >         >
> >         > _______________________________________________
> >         > Users mailing list
> >         > Users at lists.openfabrics.org
> >         > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/users
> >
> >         --
> >         Albert Chu
> >         chu11 at llnl.gov
> >         Computer Scientist
> >         High Performance Systems Division
> >         Lawrence Livermore National Laboratory
> >
> >
> >
> >
> --
> Albert Chu
> chu11 at llnl.gov
> Computer Scientist
> High Performance Systems Division
> Lawrence Livermore National Laboratory
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/users/attachments/20131025/ab11fbba/attachment.html>


More information about the Users mailing list