[Users] ibsim updates routing tables?

Robert LeBlanc robert_leblanc at byu.edu
Fri Oct 25 11:45:22 PDT 2013


But, I'm trying to route from one HCA port to another HCA port (not a
switch). I'm taking down a switch link in which there is another path
available between the HCA ports. Do the port GUIDs change in this type of
event (I don't believe that is the case).

When I take this switch port down I would expect the output to be:
>From {0x0002c90300ebbb60}[2]
[2] -> {0x0002c90300684e30}[19]
*[1] -> {0x0002c90200431fb8}[10]*
*[33(or 34)] -> {0x001397010a000044}[8(or 9)]*
[35] -> {0x0013970301001f4c}[1]
To {0x0013970301001f4b}[1]

I understand if I disconnect the HCA port then I should not be able to
connect, but taking down a switch port should cause ibsim/opensm to reroute
around the downed link. Again, please let me know if I'm missing something
because I'm still learning this.

Thank,



Robert LeBlanc
OIT Infrastructure & Virtualization Engineer
Brigham Young University


On Fri, Oct 25, 2013 at 12:35 PM, Albert Chu <chu11 at llnl.gov> wrote:

> Hi Robert,
>
> > I'm trying to test routing in ibsim, but it doesn't seem to update the
> > routing tables in the simulated switches. If I take a link down using
> > the clear command in ibsim, I see opensm saying that it is updating
> > the routing tables and that it completes, but I can't ibtracert to the
> > LID who's path was taken down.
>
> I have a feeling you might be confusing ibtracert's behavior w/ the
> typical behavior of traceroute.
>
> When you disable the link below, you are effectively taking node(s) out
> of your fabric.  OpenSM will see that the node(s) disappeared and will
> re-route the fabric.  Those nodes are now eliminated from all of the
> routing tables.  So when you ibtracert that node, ibtracert effectively
> states it can't do a traceroute b/c the node/route doesn't exist.
>
> This is different than traceroute, which output the network hops as far
> as it can go, even if the end destination is down.
>
> Al
>
> On Fri, 2013-10-25 at 12:22 -0600, Robert LeBlanc wrote:
> > I just realized that in this example I'm shutting down the entire
> > switch that the host is connected to instead of the uplink port. If I
> > issue 'clear "S-0002c90300684e30" 2"', I get the same result. Port 1
> > and 2 are both uplink ports to different leaf IB switches in a fat
> > tree scheme.
> >
> >
> >
> > Robert LeBlanc
> > OIT Infrastructure & Virtualization Engineer
> > Brigham Young University
> >
> >
> > On Fri, Oct 25, 2013 at 11:19 AM, Robert LeBlanc
> > <robert_leblanc at byu.edu> wrote:
> >         Here is the details of what I'm doing:
> >
> >
> >         In one terminal, I run ibsim:
> >         root at rleblanc-pc:/home/leblanc/Downloads# ibsim -s ibtopo
> >         parsing: ibtopo
> >         ibtopo: parsed 928 lines
> >         ########################
> >         Network simulator ready.
> >         MaxNetNodes    = 2048
> >         MaxNetSwitches = 256
> >         MaxNetPorts    = 13312
> >         MaxLinearCap   = 30720
> >         MaxMcastCap    = 1024
> >         sim> ibwarn: [2278] process_packet: no one to handle pkt:
> >         class 0x81, attr 0xff90
> >         ibwarn: [2278] process_packet: no one to handle pkt: class
> >         0x81, attr 0xff90
> >         ...snip out tons of these messages...
> >         ibwarn: [2278] process_packet: no one to handle pkt: class
> >         0x81, attr 0xff90
> >         clear "S-0002c90300684e30"
> >         sim> ibwarn: [2278] process_packet: got trap repress - drop
> >         ibwarn: [2278] process_packet: got trap repress - drop
> >         ibwarn: [2278] process_packet: no one to handle pkt: class
> >         0x81, attr 0xff90
> >         ...snip out tons of these messages...
> >         ibwarn: [2278] process_packet: no one to handle pkt: class
> >         0x81, attr 0xff90
> >         relink "0002c90300684e30"
> >         # nodeid "0002c90300684e30" (0002c90300684e30) not found
> >         sim> relink "S-0002c90300684e30"
> >         sim> ibwarn: [2278] process_packet: got trap repress - drop
> >         ibwarn: [2278] process_packet: got trap repress - drop
> >         ibwarn: [2278] process_packet: no one to handle pkt: class
> >         0x81, attr 0xff90
> >         ...snip out tons of these messages...
> >         ibwarn: [2278] process_packet: no one to handle pkt: class
> >         0x81, attr 0xff90
> >         quit
> >         Exiting network simulator.
> >         root at rleblanc-pc:/home/leblanc/Downloads#
> >
> >
> >         Then in another terminal I run opensm:
> >         root at rleblanc-pc:/home/leblanc/Documents/Work/Scripts/ib#
> >         SIM_HOST="H-0013970201000978" OSM_TMP_DIR=./ OSM_CACHE_DIR=./
> >         LD_PRELOAD=/usr/lib/umad2sim/libumad2sim.so opensm -e -v
> >         -f ./osm.log
> >         -------------------------------------------------
> >         OpenSM 3.3.15
> >         Command Line Arguments:
> >          Creating new log file
> >          Verbose option -v (log flags = 0x7)
> >          Log File: ./osm.log
> >         -------------------------------------------------
> >         OpenSM 3.3.15
> >
> >
> >         Entering DISCOVERING state
> >
> >
> >         Using default GUID 0x13970201000979
> >         Entering MASTER state
> >
> >
> >
> >
> >
> =======================================================================================================
> >         Vendor      : Ty : #  : Sta : LID  : LMC : MTU  : LWA : LSA  :
> >         Port GUID        : Neighbor Port (Port #)
> >         Unknown     : CA : 01 : ACT : 0003 :  0  : 2048 : 4x  : 2.5  :
> >         f04da29097793001 : 0002c9020042ea60 (12)
> >         Unknown     : CA : 02 : ACT : 0007 :  0  : 2048 : 4x  : 2.5  :
> >         f04da29097793002 : 0002c902004294e0 (12)
> >
> ------------------------------------------------------------------------------------------------------
> >         Mellanox    : SW : 00 :     : 0002 :  0  :      :     :      :
> >         0002c90300879a00 :
> >         Mellanox    : SW : 01 : ACT :      :     : 2048 : 4x  : 2.5  :
> >         0002c90300879a00 : 0002c90200431f90 (08)
> >         Mellanox    : SW : 02 : ACT :      :     : 2048 : 4x  : 2.5  :
> >         0002c90300879a00 : 0002c90200431f58 (09)
> >         Mellanox    : SW : 03 : DWN :      :     : ???  : ??? : Ext  :
> >         0002c90300879a00 :
> >         ...snip...
> >
> >
> >         Then in a third console I run ibtracert:
> >         leblanc at rleblanc-pc:~/Documents/Work/Scripts/ib$
> >         LD_PRELOAD=/usr/lib/umad2sim/libumad2sim.so /usr/sbin/ibtracert
> -G -n 0x0002c90300ebbb62 0x0013970301001f4c 2> /dev/null
> >         From {0x0002c90300ebbb60}[2]
> >         [2] -> {0x0002c90300684e30}[19]
> >         [2] -> {0x0002c90200431eb8}[10]
> >         [33] -> {0x001397010a000044}[10]
> >         [35] -> {0x0013970301001f4c}[1]
> >         To {0x0013970301001f4b}[1]
> >         leblanc at rleblanc-pc:~/Documents/Work/Scripts/ib$
> >         LD_PRELOAD=/usr/lib/umad2sim/libumad2sim.so /usr/sbin/ibtracert
> -G -n 0x0002c90300ebbb62 0x0013970301001f4c 2> /dev/null
> >         /usr/sbin/ibtracert: iberror: failed: can't resolve source
> >         port 0x0002c90300ebbb62
> >         leblanc at rleblanc-pc:~/Documents/Work/Scripts/ib$
> >         LD_PRELOAD=/usr/lib/umad2sim/libumad2sim.so /usr/sbin/ibtracert
> -G -n 0x0002c90300ebbb62 0x0013970301001f4c 2> /dev/null
> >         From {0x0002c90300ebbb60}[2]
> >         [2] -> {0x0002c90300684e30}[19]
> >         [2] -> {0x0002c90200431eb8}[10]
> >         [33] -> {0x001397010a000044}[10]
> >         [35] -> {0x0013970301001f4c}[1]
> >         To {0x0013970301001f4b}[1]
> >         leblanc at rleblanc-pc:~/Documents/Work/Scripts/ib$
> >
> >
> >         I'm attaching our topo file that we are using and the opensm
> >         logs (you should be able to replicate the problem given this
> >         information or tell me what I'm doing wrong).
> >
> >
> >         Thanks,
> >
> >
> >
> >         Robert LeBlanc
> >         OIT Infrastructure & Virtualization Engineer
> >         Brigham Young University
> >
> >
> >
> >         On Tue, Oct 22, 2013 at 10:55 PM, Hal Rosenstock
> >         <hal.rosenstock at gmail.com> wrote:
> >                 ibsim just simulates the network (topology, SMAs, and
> >                 PMAs). OpenSM configured the subnet including the
> >                 routing (LFTs and MFTs) based on the routing
> >                 algorithm. It is possible in a topology that multiple
> >                 routing algorithms yield the same routes. More
> >                 specifics would be needed to comment "deeper"...
> >
> >                 -- Hal
> >
> >
> >                 On Tue, Oct 22, 2013 at 6:38 PM, Robert LeBlanc
> >                 <robert_leblanc at byu.edu> wrote:
> >
> >                         I'm trying to test routing in ibsim, but it
> >                         doesn't seem to update the routing tables in
> >                         the simulated switches. If I take a link down
> >                         using the clear command in ibsim, I see opensm
> >                         saying that it is updating the routing tables
> >                         and that it completes, but I can't ibtracert
> >                         to the LID who's path was taken down.
> >
> >
> >                         Should ibsim and opensm be reconfiguring
> >                         routing in the simulated environment? No
> >                         matter which routing protocol I select in
> >                         opensm, the routes are always the same, even
> >                         having opensm re-LID the entire fabric doesn't
> >                         help. Any help would be appreciated.
> >
> >
> >                         Output from opensm:
> >
> >
> >
> ******************************************************************
> >                         ***** LID ASSIGNMENT COMPLETE - STARTING
> >                         SWITCH TABLE CONFIG *****
> >
> ******************************************************************
> >
> >
> >
> >
> >                         Oct 22 16:27:20 330198 [8437A700] 0x04 ->
> >                         osm_ucast_mgr_build_lid_matrices: Starting
> >                         switches' Min Hop Table Assignment
> >                         Oct 22 16:27:20 330954 [8437A700] 0x02 ->
> >                         osm_ucast_mgr_process: minhop tables
> >                         configured on all switches
> >                         Oct 22 16:27:20 331191 [8437A700] 0x04 ->
> >                         do_sweep:
> >
> >
> >
> >
> >
> ******************************************************************
> >                         **************** SWITCHES CONFIGURED FOR
> >                         UNICAST *****************
> >
> ******************************************************************
> >
> >
> >
> >
> >                         Thanks,
> >
> >
> >                         Robert LeBlanc
> >                         OIT Infrastructure & Virtualization Engineer
> >                         Brigham Young University
> >
> >
> >                         _______________________________________________
> >                         Users mailing list
> >                         Users at lists.openfabrics.org
> >
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/users
> >
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > Users mailing list
> > Users at lists.openfabrics.org
> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/users
> --
> Albert Chu
> chu11 at llnl.gov
> Computer Scientist
> High Performance Systems Division
> Lawrence Livermore National Laboratory
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/users/attachments/20131025/89ae383f/attachment.html>


More information about the Users mailing list