[Users] ibsim updates routing tables?
Robert LeBlanc
robert_leblanc at byu.edu
Fri Oct 25 13:30:01 PDT 2013
Al, you pegged it! I seriously need to undo 10 years of reading Linux
documentation to use ibsim, the command syntax needs to be VERY literal.
leblanc at rleblanc-pc:~/Documents/Work/Scripts/ib$
LD_PRELOAD=/usr/lib/umad2sim/libumad2sim.so /usr/sbin/ibtracert -G -n
0x0002c90300ebbb62 0x0013970301001f4c 2> /dev/null
>From {0x0002c90300ebbb60}[2]
[2] -> {0x0002c90300684e30}[19]
[1] -> {0x0002c90200431fb8}[10]
[33] -> {0x001397010a000044}[8]
[35] -> {0x0013970301001f4c}[1]
To {0x0013970301001f4b}[1]
leblanc at rleblanc-pc:~/Documents/Work/Scripts/ib$
Now to get this program back on track! Thank you for being patient.
Robert LeBlanc
OIT Infrastructure & Virtualization Engineer
Brigham Young University
On Fri, Oct 25, 2013 at 2:26 PM, Albert Chu <chu11 at llnl.gov> wrote:
> I now see your earlier reply. You realized your mistake that you were
> disabling all the links on the switch, which effectively lead to
> disabling all the nodes.
>
> I think you typoed your second attempt. It should be:
>
> clear "S-0002c90300684e30"[2]
>
> Al
>
> On Fri, 2013-10-25 at 12:45 -0600, Robert LeBlanc wrote:
> > But, I'm trying to route from one HCA port to another HCA port (not a
> > switch). I'm taking down a switch link in which there is another path
> > available between the HCA ports. Do the port GUIDs change in this type
> > of event (I don't believe that is the case).
> >
> >
> > When I take this switch port down I would expect the output to be:
> > From {0x0002c90300ebbb60}[2]
> > [2] -> {0x0002c90300684e30}[19]
> > [1] -> {0x0002c90200431fb8}[10]
> > [33(or 34)] -> {0x001397010a000044}[8(or 9)]
> > [35] -> {0x0013970301001f4c}[1]
> > To {0x0013970301001f4b}[1]
> >
> >
> > I understand if I disconnect the HCA port then I should not be able to
> > connect, but taking down a switch port should cause ibsim/opensm to
> > reroute around the downed link. Again, please let me know if I'm
> > missing something because I'm still learning this.
> >
> >
> > Thank,
> >
> >
> >
> >
> >
> > Robert LeBlanc
> > OIT Infrastructure & Virtualization Engineer
> > Brigham Young University
> >
> >
> > On Fri, Oct 25, 2013 at 12:35 PM, Albert Chu <chu11 at llnl.gov> wrote:
> > Hi Robert,
> >
> > > I'm trying to test routing in ibsim, but it doesn't seem to
> > update the
> > > routing tables in the simulated switches. If I take a link
> > down using
> > > the clear command in ibsim, I see opensm saying that it is
> > updating
> > > the routing tables and that it completes, but I can't
> > ibtracert to the
> > > LID who's path was taken down.
> >
> >
> > I have a feeling you might be confusing ibtracert's behavior
> > w/ the
> > typical behavior of traceroute.
> >
> > When you disable the link below, you are effectively taking
> > node(s) out
> > of your fabric. OpenSM will see that the node(s) disappeared
> > and will
> > re-route the fabric. Those nodes are now eliminated from all
> > of the
> > routing tables. So when you ibtracert that node, ibtracert
> > effectively
> > states it can't do a traceroute b/c the node/route doesn't
> > exist.
> >
> > This is different than traceroute, which output the network
> > hops as far
> > as it can go, even if the end destination is down.
> >
> > Al
> >
> > On Fri, 2013-10-25 at 12:22 -0600, Robert LeBlanc wrote:
> > > I just realized that in this example I'm shutting down the
> > entire
> > > switch that the host is connected to instead of the uplink
> > port. If I
> > > issue 'clear "S-0002c90300684e30" 2"', I get the same
> > result. Port 1
> > > and 2 are both uplink ports to different leaf IB switches in
> > a fat
> > > tree scheme.
> > >
> > >
> > >
> > > Robert LeBlanc
> > > OIT Infrastructure & Virtualization Engineer
> > > Brigham Young University
> > >
> > >
> > > On Fri, Oct 25, 2013 at 11:19 AM, Robert LeBlanc
> > > <robert_leblanc at byu.edu> wrote:
> > > Here is the details of what I'm doing:
> > >
> > >
> > > In one terminal, I run ibsim:
> > > root at rleblanc-pc:/home/leblanc/Downloads# ibsim -s
> > ibtopo
> > > parsing: ibtopo
> > > ibtopo: parsed 928 lines
> > > ########################
> > > Network simulator ready.
> > > MaxNetNodes = 2048
> > > MaxNetSwitches = 256
> > > MaxNetPorts = 13312
> > > MaxLinearCap = 30720
> > > MaxMcastCap = 1024
> > > sim> ibwarn: [2278] process_packet: no one to handle
> > pkt:
> > > class 0x81, attr 0xff90
> > > ibwarn: [2278] process_packet: no one to handle pkt:
> > class
> > > 0x81, attr 0xff90
> > > ...snip out tons of these messages...
> > > ibwarn: [2278] process_packet: no one to handle pkt:
> > class
> > > 0x81, attr 0xff90
> > > clear "S-0002c90300684e30"
> > > sim> ibwarn: [2278] process_packet: got trap repress
> > - drop
> > > ibwarn: [2278] process_packet: got trap repress -
> > drop
> > > ibwarn: [2278] process_packet: no one to handle pkt:
> > class
> > > 0x81, attr 0xff90
> > > ...snip out tons of these messages...
> > > ibwarn: [2278] process_packet: no one to handle pkt:
> > class
> > > 0x81, attr 0xff90
> > > relink "0002c90300684e30"
> > > # nodeid "0002c90300684e30" (0002c90300684e30) not
> > found
> > > sim> relink "S-0002c90300684e30"
> > > sim> ibwarn: [2278] process_packet: got trap repress
> > - drop
> > > ibwarn: [2278] process_packet: got trap repress -
> > drop
> > > ibwarn: [2278] process_packet: no one to handle pkt:
> > class
> > > 0x81, attr 0xff90
> > > ...snip out tons of these messages...
> > > ibwarn: [2278] process_packet: no one to handle pkt:
> > class
> > > 0x81, attr 0xff90
> > > quit
> > > Exiting network simulator.
> > > root at rleblanc-pc:/home/leblanc/Downloads#
> > >
> > >
> > > Then in another terminal I run opensm:
> > >
> > root at rleblanc-pc:/home/leblanc/Documents/Work/Scripts/ib#
> > > SIM_HOST="H-0013970201000978" OSM_TMP_DIR=./
> > OSM_CACHE_DIR=./
> > > LD_PRELOAD=/usr/lib/umad2sim/libumad2sim.so opensm
> > -e -v
> > > -f ./osm.log
> > > -------------------------------------------------
> > > OpenSM 3.3.15
> > > Command Line Arguments:
> > > Creating new log file
> > > Verbose option -v (log flags = 0x7)
> > > Log File: ./osm.log
> > > -------------------------------------------------
> > > OpenSM 3.3.15
> > >
> > >
> > > Entering DISCOVERING state
> > >
> > >
> > > Using default GUID 0x13970201000979
> > > Entering MASTER state
> > >
> > >
> > >
> > >
> > >
> >
> =======================================================================================================
> > > Vendor : Ty : # : Sta : LID : LMC : MTU :
> > LWA : LSA :
> > > Port GUID : Neighbor Port (Port #)
> > > Unknown : CA : 01 : ACT : 0003 : 0 : 2048 :
> > 4x : 2.5 :
> > > f04da29097793001 : 0002c9020042ea60 (12)
> > > Unknown : CA : 02 : ACT : 0007 : 0 : 2048 :
> > 4x : 2.5 :
> > > f04da29097793002 : 0002c902004294e0 (12)
> > >
> >
> ------------------------------------------------------------------------------------------------------
> > > Mellanox : SW : 00 : : 0002 : 0 : :
> > : :
> > > 0002c90300879a00 :
> > > Mellanox : SW : 01 : ACT : : : 2048 :
> > 4x : 2.5 :
> > > 0002c90300879a00 : 0002c90200431f90 (08)
> > > Mellanox : SW : 02 : ACT : : : 2048 :
> > 4x : 2.5 :
> > > 0002c90300879a00 : 0002c90200431f58 (09)
> > > Mellanox : SW : 03 : DWN : : : ???
> > : ??? : Ext :
> > > 0002c90300879a00 :
> > > ...snip...
> > >
> > >
> > > Then in a third console I run ibtracert:
> > > leblanc at rleblanc-pc:~/Documents/Work/Scripts/ib$
> > >
> > LD_PRELOAD=/usr/lib/umad2sim/libumad2sim.so /usr/sbin/ibtracert
> -G -n 0x0002c90300ebbb62 0x0013970301001f4c 2> /dev/null
> > > From {0x0002c90300ebbb60}[2]
> > > [2] -> {0x0002c90300684e30}[19]
> > > [2] -> {0x0002c90200431eb8}[10]
> > > [33] -> {0x001397010a000044}[10]
> > > [35] -> {0x0013970301001f4c}[1]
> > > To {0x0013970301001f4b}[1]
> > > leblanc at rleblanc-pc:~/Documents/Work/Scripts/ib$
> > >
> > LD_PRELOAD=/usr/lib/umad2sim/libumad2sim.so /usr/sbin/ibtracert
> -G -n 0x0002c90300ebbb62 0x0013970301001f4c 2> /dev/null
> > > /usr/sbin/ibtracert: iberror: failed: can't resolve
> > source
> > > port 0x0002c90300ebbb62
> > > leblanc at rleblanc-pc:~/Documents/Work/Scripts/ib$
> > >
> > LD_PRELOAD=/usr/lib/umad2sim/libumad2sim.so /usr/sbin/ibtracert
> -G -n 0x0002c90300ebbb62 0x0013970301001f4c 2> /dev/null
> > > From {0x0002c90300ebbb60}[2]
> > > [2] -> {0x0002c90300684e30}[19]
> > > [2] -> {0x0002c90200431eb8}[10]
> > > [33] -> {0x001397010a000044}[10]
> > > [35] -> {0x0013970301001f4c}[1]
> > > To {0x0013970301001f4b}[1]
> > > leblanc at rleblanc-pc:~/Documents/Work/Scripts/ib$
> > >
> > >
> > > I'm attaching our topo file that we are using and
> > the opensm
> > > logs (you should be able to replicate the problem
> > given this
> > > information or tell me what I'm doing wrong).
> > >
> > >
> > > Thanks,
> > >
> > >
> > >
> > > Robert LeBlanc
> > > OIT Infrastructure & Virtualization Engineer
> > > Brigham Young University
> > >
> > >
> > >
> > > On Tue, Oct 22, 2013 at 10:55 PM, Hal Rosenstock
> > > <hal.rosenstock at gmail.com> wrote:
> > > ibsim just simulates the network (topology,
> > SMAs, and
> > > PMAs). OpenSM configured the subnet
> > including the
> > > routing (LFTs and MFTs) based on the routing
> > > algorithm. It is possible in a topology that
> > multiple
> > > routing algorithms yield the same routes.
> > More
> > > specifics would be needed to comment
> > "deeper"...
> > >
> > > -- Hal
> > >
> > >
> > > On Tue, Oct 22, 2013 at 6:38 PM, Robert
> > LeBlanc
> > > <robert_leblanc at byu.edu> wrote:
> > >
> > > I'm trying to test routing in ibsim,
> > but it
> > > doesn't seem to update the routing
> > tables in
> > > the simulated switches. If I take a
> > link down
> > > using the clear command in ibsim, I
> > see opensm
> > > saying that it is updating the
> > routing tables
> > > and that it completes, but I can't
> > ibtracert
> > > to the LID who's path was taken
> > down.
> > >
> > >
> > > Should ibsim and opensm be
> > reconfiguring
> > > routing in the simulated
> > environment? No
> > > matter which routing protocol I
> > select in
> > > opensm, the routes are always the
> > same, even
> > > having opensm re-LID the entire
> > fabric doesn't
> > > help. Any help would be appreciated.
> > >
> > >
> > > Output from opensm:
> > >
> > >
> > >
> >
> ******************************************************************
> > > ***** LID ASSIGNMENT COMPLETE -
> > STARTING
> > > SWITCH TABLE CONFIG *****
> > >
> >
> ******************************************************************
> > >
> > >
> > >
> > >
> > > Oct 22 16:27:20 330198 [8437A700]
> > 0x04 ->
> > > osm_ucast_mgr_build_lid_matrices:
> > Starting
> > > switches' Min Hop Table Assignment
> > > Oct 22 16:27:20 330954 [8437A700]
> > 0x02 ->
> > > osm_ucast_mgr_process: minhop tables
> > > configured on all switches
> > > Oct 22 16:27:20 331191 [8437A700]
> > 0x04 ->
> > > do_sweep:
> > >
> > >
> > >
> > >
> > >
> >
> ******************************************************************
> > > **************** SWITCHES CONFIGURED
> > FOR
> > > UNICAST *****************
> > >
> >
> ******************************************************************
> > >
> > >
> > >
> > >
> > > Thanks,
> > >
> > >
> > > Robert LeBlanc
> > > OIT Infrastructure & Virtualization
> > Engineer
> > > Brigham Young University
> > >
> > >
> > >
> > _______________________________________________
> > > Users mailing list
> > > Users at lists.openfabrics.org
> > >
> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/users
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > Users mailing list
> > > Users at lists.openfabrics.org
> > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/users
> >
> > --
> > Albert Chu
> > chu11 at llnl.gov
> > Computer Scientist
> > High Performance Systems Division
> > Lawrence Livermore National Laboratory
> >
> >
> >
> >
> --
> Albert Chu
> chu11 at llnl.gov
> Computer Scientist
> High Performance Systems Division
> Lawrence Livermore National Laboratory
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/users/attachments/20131025/ab11fbba/attachment.html>
More information about the Users
mailing list