[Users] ibsim updates routing tables?
Albert Chu
chu11 at llnl.gov
Fri Oct 25 13:26:12 PDT 2013
I now see your earlier reply. You realized your mistake that you were
disabling all the links on the switch, which effectively lead to
disabling all the nodes.
I think you typoed your second attempt. It should be:
clear "S-0002c90300684e30"[2]
Al
On Fri, 2013-10-25 at 12:45 -0600, Robert LeBlanc wrote:
> But, I'm trying to route from one HCA port to another HCA port (not a
> switch). I'm taking down a switch link in which there is another path
> available between the HCA ports. Do the port GUIDs change in this type
> of event (I don't believe that is the case).
>
>
> When I take this switch port down I would expect the output to be:
> From {0x0002c90300ebbb60}[2]
> [2] -> {0x0002c90300684e30}[19]
> [1] -> {0x0002c90200431fb8}[10]
> [33(or 34)] -> {0x001397010a000044}[8(or 9)]
> [35] -> {0x0013970301001f4c}[1]
> To {0x0013970301001f4b}[1]
>
>
> I understand if I disconnect the HCA port then I should not be able to
> connect, but taking down a switch port should cause ibsim/opensm to
> reroute around the downed link. Again, please let me know if I'm
> missing something because I'm still learning this.
>
>
> Thank,
>
>
>
>
>
> Robert LeBlanc
> OIT Infrastructure & Virtualization Engineer
> Brigham Young University
>
>
> On Fri, Oct 25, 2013 at 12:35 PM, Albert Chu <chu11 at llnl.gov> wrote:
> Hi Robert,
>
> > I'm trying to test routing in ibsim, but it doesn't seem to
> update the
> > routing tables in the simulated switches. If I take a link
> down using
> > the clear command in ibsim, I see opensm saying that it is
> updating
> > the routing tables and that it completes, but I can't
> ibtracert to the
> > LID who's path was taken down.
>
>
> I have a feeling you might be confusing ibtracert's behavior
> w/ the
> typical behavior of traceroute.
>
> When you disable the link below, you are effectively taking
> node(s) out
> of your fabric. OpenSM will see that the node(s) disappeared
> and will
> re-route the fabric. Those nodes are now eliminated from all
> of the
> routing tables. So when you ibtracert that node, ibtracert
> effectively
> states it can't do a traceroute b/c the node/route doesn't
> exist.
>
> This is different than traceroute, which output the network
> hops as far
> as it can go, even if the end destination is down.
>
> Al
>
> On Fri, 2013-10-25 at 12:22 -0600, Robert LeBlanc wrote:
> > I just realized that in this example I'm shutting down the
> entire
> > switch that the host is connected to instead of the uplink
> port. If I
> > issue 'clear "S-0002c90300684e30" 2"', I get the same
> result. Port 1
> > and 2 are both uplink ports to different leaf IB switches in
> a fat
> > tree scheme.
> >
> >
> >
> > Robert LeBlanc
> > OIT Infrastructure & Virtualization Engineer
> > Brigham Young University
> >
> >
> > On Fri, Oct 25, 2013 at 11:19 AM, Robert LeBlanc
> > <robert_leblanc at byu.edu> wrote:
> > Here is the details of what I'm doing:
> >
> >
> > In one terminal, I run ibsim:
> > root at rleblanc-pc:/home/leblanc/Downloads# ibsim -s
> ibtopo
> > parsing: ibtopo
> > ibtopo: parsed 928 lines
> > ########################
> > Network simulator ready.
> > MaxNetNodes = 2048
> > MaxNetSwitches = 256
> > MaxNetPorts = 13312
> > MaxLinearCap = 30720
> > MaxMcastCap = 1024
> > sim> ibwarn: [2278] process_packet: no one to handle
> pkt:
> > class 0x81, attr 0xff90
> > ibwarn: [2278] process_packet: no one to handle pkt:
> class
> > 0x81, attr 0xff90
> > ...snip out tons of these messages...
> > ibwarn: [2278] process_packet: no one to handle pkt:
> class
> > 0x81, attr 0xff90
> > clear "S-0002c90300684e30"
> > sim> ibwarn: [2278] process_packet: got trap repress
> - drop
> > ibwarn: [2278] process_packet: got trap repress -
> drop
> > ibwarn: [2278] process_packet: no one to handle pkt:
> class
> > 0x81, attr 0xff90
> > ...snip out tons of these messages...
> > ibwarn: [2278] process_packet: no one to handle pkt:
> class
> > 0x81, attr 0xff90
> > relink "0002c90300684e30"
> > # nodeid "0002c90300684e30" (0002c90300684e30) not
> found
> > sim> relink "S-0002c90300684e30"
> > sim> ibwarn: [2278] process_packet: got trap repress
> - drop
> > ibwarn: [2278] process_packet: got trap repress -
> drop
> > ibwarn: [2278] process_packet: no one to handle pkt:
> class
> > 0x81, attr 0xff90
> > ...snip out tons of these messages...
> > ibwarn: [2278] process_packet: no one to handle pkt:
> class
> > 0x81, attr 0xff90
> > quit
> > Exiting network simulator.
> > root at rleblanc-pc:/home/leblanc/Downloads#
> >
> >
> > Then in another terminal I run opensm:
> >
> root at rleblanc-pc:/home/leblanc/Documents/Work/Scripts/ib#
> > SIM_HOST="H-0013970201000978" OSM_TMP_DIR=./
> OSM_CACHE_DIR=./
> > LD_PRELOAD=/usr/lib/umad2sim/libumad2sim.so opensm
> -e -v
> > -f ./osm.log
> > -------------------------------------------------
> > OpenSM 3.3.15
> > Command Line Arguments:
> > Creating new log file
> > Verbose option -v (log flags = 0x7)
> > Log File: ./osm.log
> > -------------------------------------------------
> > OpenSM 3.3.15
> >
> >
> > Entering DISCOVERING state
> >
> >
> > Using default GUID 0x13970201000979
> > Entering MASTER state
> >
> >
> >
> >
> >
> =======================================================================================================
> > Vendor : Ty : # : Sta : LID : LMC : MTU :
> LWA : LSA :
> > Port GUID : Neighbor Port (Port #)
> > Unknown : CA : 01 : ACT : 0003 : 0 : 2048 :
> 4x : 2.5 :
> > f04da29097793001 : 0002c9020042ea60 (12)
> > Unknown : CA : 02 : ACT : 0007 : 0 : 2048 :
> 4x : 2.5 :
> > f04da29097793002 : 0002c902004294e0 (12)
> >
> ------------------------------------------------------------------------------------------------------
> > Mellanox : SW : 00 : : 0002 : 0 : :
> : :
> > 0002c90300879a00 :
> > Mellanox : SW : 01 : ACT : : : 2048 :
> 4x : 2.5 :
> > 0002c90300879a00 : 0002c90200431f90 (08)
> > Mellanox : SW : 02 : ACT : : : 2048 :
> 4x : 2.5 :
> > 0002c90300879a00 : 0002c90200431f58 (09)
> > Mellanox : SW : 03 : DWN : : : ???
> : ??? : Ext :
> > 0002c90300879a00 :
> > ...snip...
> >
> >
> > Then in a third console I run ibtracert:
> > leblanc at rleblanc-pc:~/Documents/Work/Scripts/ib$
> >
> LD_PRELOAD=/usr/lib/umad2sim/libumad2sim.so /usr/sbin/ibtracert -G -n 0x0002c90300ebbb62 0x0013970301001f4c 2> /dev/null
> > From {0x0002c90300ebbb60}[2]
> > [2] -> {0x0002c90300684e30}[19]
> > [2] -> {0x0002c90200431eb8}[10]
> > [33] -> {0x001397010a000044}[10]
> > [35] -> {0x0013970301001f4c}[1]
> > To {0x0013970301001f4b}[1]
> > leblanc at rleblanc-pc:~/Documents/Work/Scripts/ib$
> >
> LD_PRELOAD=/usr/lib/umad2sim/libumad2sim.so /usr/sbin/ibtracert -G -n 0x0002c90300ebbb62 0x0013970301001f4c 2> /dev/null
> > /usr/sbin/ibtracert: iberror: failed: can't resolve
> source
> > port 0x0002c90300ebbb62
> > leblanc at rleblanc-pc:~/Documents/Work/Scripts/ib$
> >
> LD_PRELOAD=/usr/lib/umad2sim/libumad2sim.so /usr/sbin/ibtracert -G -n 0x0002c90300ebbb62 0x0013970301001f4c 2> /dev/null
> > From {0x0002c90300ebbb60}[2]
> > [2] -> {0x0002c90300684e30}[19]
> > [2] -> {0x0002c90200431eb8}[10]
> > [33] -> {0x001397010a000044}[10]
> > [35] -> {0x0013970301001f4c}[1]
> > To {0x0013970301001f4b}[1]
> > leblanc at rleblanc-pc:~/Documents/Work/Scripts/ib$
> >
> >
> > I'm attaching our topo file that we are using and
> the opensm
> > logs (you should be able to replicate the problem
> given this
> > information or tell me what I'm doing wrong).
> >
> >
> > Thanks,
> >
> >
> >
> > Robert LeBlanc
> > OIT Infrastructure & Virtualization Engineer
> > Brigham Young University
> >
> >
> >
> > On Tue, Oct 22, 2013 at 10:55 PM, Hal Rosenstock
> > <hal.rosenstock at gmail.com> wrote:
> > ibsim just simulates the network (topology,
> SMAs, and
> > PMAs). OpenSM configured the subnet
> including the
> > routing (LFTs and MFTs) based on the routing
> > algorithm. It is possible in a topology that
> multiple
> > routing algorithms yield the same routes.
> More
> > specifics would be needed to comment
> "deeper"...
> >
> > -- Hal
> >
> >
> > On Tue, Oct 22, 2013 at 6:38 PM, Robert
> LeBlanc
> > <robert_leblanc at byu.edu> wrote:
> >
> > I'm trying to test routing in ibsim,
> but it
> > doesn't seem to update the routing
> tables in
> > the simulated switches. If I take a
> link down
> > using the clear command in ibsim, I
> see opensm
> > saying that it is updating the
> routing tables
> > and that it completes, but I can't
> ibtracert
> > to the LID who's path was taken
> down.
> >
> >
> > Should ibsim and opensm be
> reconfiguring
> > routing in the simulated
> environment? No
> > matter which routing protocol I
> select in
> > opensm, the routes are always the
> same, even
> > having opensm re-LID the entire
> fabric doesn't
> > help. Any help would be appreciated.
> >
> >
> > Output from opensm:
> >
> >
> >
> ******************************************************************
> > ***** LID ASSIGNMENT COMPLETE -
> STARTING
> > SWITCH TABLE CONFIG *****
> >
> ******************************************************************
> >
> >
> >
> >
> > Oct 22 16:27:20 330198 [8437A700]
> 0x04 ->
> > osm_ucast_mgr_build_lid_matrices:
> Starting
> > switches' Min Hop Table Assignment
> > Oct 22 16:27:20 330954 [8437A700]
> 0x02 ->
> > osm_ucast_mgr_process: minhop tables
> > configured on all switches
> > Oct 22 16:27:20 331191 [8437A700]
> 0x04 ->
> > do_sweep:
> >
> >
> >
> >
> >
> ******************************************************************
> > **************** SWITCHES CONFIGURED
> FOR
> > UNICAST *****************
> >
> ******************************************************************
> >
> >
> >
> >
> > Thanks,
> >
> >
> > Robert LeBlanc
> > OIT Infrastructure & Virtualization
> Engineer
> > Brigham Young University
> >
> >
> >
> _______________________________________________
> > Users mailing list
> > Users at lists.openfabrics.org
> >
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/users
> >
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > Users mailing list
> > Users at lists.openfabrics.org
> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/users
>
> --
> Albert Chu
> chu11 at llnl.gov
> Computer Scientist
> High Performance Systems Division
> Lawrence Livermore National Laboratory
>
>
>
>
--
Albert Chu
chu11 at llnl.gov
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory
More information about the Users
mailing list