<div dir="ltr">Al, you pegged it! I seriously need to undo 10 years of reading Linux documentation to use ibsim, the command syntax needs to be VERY literal.<div><br></div><div><div><font face="courier new, monospace">leblanc@rleblanc-pc:~/Documents/Work/Scripts/ib$ LD_PRELOAD=/usr/lib/umad2sim/libumad2sim.so /usr/sbin/ibtracert -G -n 0x0002c90300ebbb62 0x0013970301001f4c 2> /dev/null </font></div>
<div><font face="courier new, monospace">From {0x0002c90300ebbb60}[2]</font></div><div><font face="courier new, monospace">[2] -> {0x0002c90300684e30}[19]</font></div><div><font face="courier new, monospace">[1] -> {0x0002c90200431fb8}[10]</font></div>
<div><font face="courier new, monospace">[33] -> {0x001397010a000044}[8]</font></div><div><font face="courier new, monospace">[35] -> {0x0013970301001f4c}[1]</font></div><div><font face="courier new, monospace">To {0x0013970301001f4b}[1]</font></div>
<div><font face="courier new, monospace">leblanc@rleblanc-pc:~/Documents/Work/Scripts/ib$ </font></div></div><div><br></div><div>Now to get this program back on track! Thank you for being patient.</div></div><div class="gmail_extra">
<br clear="all"><div><div><span style="font-family:arial,sans-serif;font-size:13px;background-color:rgb(255,255,255)"><br></span></div><span style="font-family:arial,sans-serif;font-size:13px;background-color:rgb(255,255,255)">Robert LeBlanc</span><br style="font-family:arial,sans-serif;font-size:13px;background-color:rgb(255,255,255)">
<span style="font-family:arial,sans-serif;font-size:13px;background-color:rgb(255,255,255)">OIT Infrastructure & Virtualization Engineer</span><br style="font-family:arial,sans-serif;font-size:13px;background-color:rgb(255,255,255)">
<span style="font-family:arial,sans-serif;font-size:13px;background-color:rgb(255,255,255)">Brigham Young University</span></div>
<br><br><div class="gmail_quote">On Fri, Oct 25, 2013 at 2:26 PM, Albert Chu <span dir="ltr"><<a href="mailto:chu11@llnl.gov" target="_blank">chu11@llnl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
I now see your earlier reply. You realized your mistake that you were<br>
disabling all the links on the switch, which effectively lead to<br>
disabling all the nodes.<br>
<br>
I think you typoed your second attempt. It should be:<br>
<br>
clear "S-0002c90300684e30"[2]<br>
<br>
Al<br>
<div class="HOEnZb"><div class="h5"><br>
On Fri, 2013-10-25 at 12:45 -0600, Robert LeBlanc wrote:<br>
> But, I'm trying to route from one HCA port to another HCA port (not a<br>
> switch). I'm taking down a switch link in which there is another path<br>
> available between the HCA ports. Do the port GUIDs change in this type<br>
> of event (I don't believe that is the case).<br>
><br>
><br>
> When I take this switch port down I would expect the output to be:<br>
> From {0x0002c90300ebbb60}[2]<br>
> [2] -> {0x0002c90300684e30}[19]<br>
> [1] -> {0x0002c90200431fb8}[10]<br>
> [33(or 34)] -> {0x001397010a000044}[8(or 9)]<br>
> [35] -> {0x0013970301001f4c}[1]<br>
> To {0x0013970301001f4b}[1]<br>
><br>
><br>
> I understand if I disconnect the HCA port then I should not be able to<br>
> connect, but taking down a switch port should cause ibsim/opensm to<br>
> reroute around the downed link. Again, please let me know if I'm<br>
> missing something because I'm still learning this.<br>
><br>
><br>
> Thank,<br>
><br>
><br>
><br>
><br>
><br>
> Robert LeBlanc<br>
> OIT Infrastructure & Virtualization Engineer<br>
> Brigham Young University<br>
><br>
><br>
> On Fri, Oct 25, 2013 at 12:35 PM, Albert Chu <<a href="mailto:chu11@llnl.gov">chu11@llnl.gov</a>> wrote:<br>
> Hi Robert,<br>
><br>
> > I'm trying to test routing in ibsim, but it doesn't seem to<br>
> update the<br>
> > routing tables in the simulated switches. If I take a link<br>
> down using<br>
> > the clear command in ibsim, I see opensm saying that it is<br>
> updating<br>
> > the routing tables and that it completes, but I can't<br>
> ibtracert to the<br>
> > LID who's path was taken down.<br>
><br>
><br>
> I have a feeling you might be confusing ibtracert's behavior<br>
> w/ the<br>
> typical behavior of traceroute.<br>
><br>
> When you disable the link below, you are effectively taking<br>
> node(s) out<br>
> of your fabric. OpenSM will see that the node(s) disappeared<br>
> and will<br>
> re-route the fabric. Those nodes are now eliminated from all<br>
> of the<br>
> routing tables. So when you ibtracert that node, ibtracert<br>
> effectively<br>
> states it can't do a traceroute b/c the node/route doesn't<br>
> exist.<br>
><br>
> This is different than traceroute, which output the network<br>
> hops as far<br>
> as it can go, even if the end destination is down.<br>
><br>
> Al<br>
><br>
> On Fri, 2013-10-25 at 12:22 -0600, Robert LeBlanc wrote:<br>
> > I just realized that in this example I'm shutting down the<br>
> entire<br>
> > switch that the host is connected to instead of the uplink<br>
> port. If I<br>
> > issue 'clear "S-0002c90300684e30" 2"', I get the same<br>
> result. Port 1<br>
> > and 2 are both uplink ports to different leaf IB switches in<br>
> a fat<br>
> > tree scheme.<br>
> ><br>
> ><br>
> ><br>
> > Robert LeBlanc<br>
> > OIT Infrastructure & Virtualization Engineer<br>
> > Brigham Young University<br>
> ><br>
> ><br>
> > On Fri, Oct 25, 2013 at 11:19 AM, Robert LeBlanc<br>
> > <<a href="mailto:robert_leblanc@byu.edu">robert_leblanc@byu.edu</a>> wrote:<br>
> > Here is the details of what I'm doing:<br>
> ><br>
> ><br>
> > In one terminal, I run ibsim:<br>
> > root@rleblanc-pc:/home/leblanc/Downloads# ibsim -s<br>
> ibtopo<br>
> > parsing: ibtopo<br>
> > ibtopo: parsed 928 lines<br>
> > ########################<br>
> > Network simulator ready.<br>
> > MaxNetNodes = 2048<br>
> > MaxNetSwitches = 256<br>
> > MaxNetPorts = 13312<br>
> > MaxLinearCap = 30720<br>
> > MaxMcastCap = 1024<br>
> > sim> ibwarn: [2278] process_packet: no one to handle<br>
> pkt:<br>
> > class 0x81, attr 0xff90<br>
> > ibwarn: [2278] process_packet: no one to handle pkt:<br>
> class<br>
> > 0x81, attr 0xff90<br>
> > ...snip out tons of these messages...<br>
> > ibwarn: [2278] process_packet: no one to handle pkt:<br>
> class<br>
> > 0x81, attr 0xff90<br>
> > clear "S-0002c90300684e30"<br>
> > sim> ibwarn: [2278] process_packet: got trap repress<br>
> - drop<br>
> > ibwarn: [2278] process_packet: got trap repress -<br>
> drop<br>
> > ibwarn: [2278] process_packet: no one to handle pkt:<br>
> class<br>
> > 0x81, attr 0xff90<br>
> > ...snip out tons of these messages...<br>
> > ibwarn: [2278] process_packet: no one to handle pkt:<br>
> class<br>
> > 0x81, attr 0xff90<br>
> > relink "0002c90300684e30"<br>
> > # nodeid "0002c90300684e30" (0002c90300684e30) not<br>
> found<br>
> > sim> relink "S-0002c90300684e30"<br>
> > sim> ibwarn: [2278] process_packet: got trap repress<br>
> - drop<br>
> > ibwarn: [2278] process_packet: got trap repress -<br>
> drop<br>
> > ibwarn: [2278] process_packet: no one to handle pkt:<br>
> class<br>
> > 0x81, attr 0xff90<br>
> > ...snip out tons of these messages...<br>
> > ibwarn: [2278] process_packet: no one to handle pkt:<br>
> class<br>
> > 0x81, attr 0xff90<br>
> > quit<br>
> > Exiting network simulator.<br>
> > root@rleblanc-pc:/home/leblanc/Downloads#<br>
> ><br>
> ><br>
> > Then in another terminal I run opensm:<br>
> ><br>
> root@rleblanc-pc:/home/leblanc/Documents/Work/Scripts/ib#<br>
> > SIM_HOST="H-0013970201000978" OSM_TMP_DIR=./<br>
> OSM_CACHE_DIR=./<br>
> > LD_PRELOAD=/usr/lib/umad2sim/libumad2sim.so opensm<br>
> -e -v<br>
> > -f ./osm.log<br>
> > -------------------------------------------------<br>
> > OpenSM 3.3.15<br>
> > Command Line Arguments:<br>
> > Creating new log file<br>
> > Verbose option -v (log flags = 0x7)<br>
> > Log File: ./osm.log<br>
> > -------------------------------------------------<br>
> > OpenSM 3.3.15<br>
> ><br>
> ><br>
> > Entering DISCOVERING state<br>
> ><br>
> ><br>
> > Using default GUID 0x13970201000979<br>
> > Entering MASTER state<br>
> ><br>
> ><br>
> ><br>
> ><br>
> ><br>
> =======================================================================================================<br>
> > Vendor : Ty : # : Sta : LID : LMC : MTU :<br>
> LWA : LSA :<br>
> > Port GUID : Neighbor Port (Port #)<br>
> > Unknown : CA : 01 : ACT : 0003 : 0 : 2048 :<br>
> 4x : 2.5 :<br>
> > f04da29097793001 : 0002c9020042ea60 (12)<br>
> > Unknown : CA : 02 : ACT : 0007 : 0 : 2048 :<br>
> 4x : 2.5 :<br>
> > f04da29097793002 : 0002c902004294e0 (12)<br>
> ><br>
> ------------------------------------------------------------------------------------------------------<br>
> > Mellanox : SW : 00 : : 0002 : 0 : :<br>
> : :<br>
> > 0002c90300879a00 :<br>
> > Mellanox : SW : 01 : ACT : : : 2048 :<br>
> 4x : 2.5 :<br>
> > 0002c90300879a00 : 0002c90200431f90 (08)<br>
> > Mellanox : SW : 02 : ACT : : : 2048 :<br>
> 4x : 2.5 :<br>
> > 0002c90300879a00 : 0002c90200431f58 (09)<br>
> > Mellanox : SW : 03 : DWN : : : ???<br>
> : ??? : Ext :<br>
> > 0002c90300879a00 :<br>
> > ...snip...<br>
> ><br>
> ><br>
> > Then in a third console I run ibtracert:<br>
> > leblanc@rleblanc-pc:~/Documents/Work/Scripts/ib$<br>
> ><br>
> LD_PRELOAD=/usr/lib/umad2sim/libumad2sim.so /usr/sbin/ibtracert -G -n 0x0002c90300ebbb62 0x0013970301001f4c 2> /dev/null<br>
> > From {0x0002c90300ebbb60}[2]<br>
> > [2] -> {0x0002c90300684e30}[19]<br>
> > [2] -> {0x0002c90200431eb8}[10]<br>
> > [33] -> {0x001397010a000044}[10]<br>
> > [35] -> {0x0013970301001f4c}[1]<br>
> > To {0x0013970301001f4b}[1]<br>
> > leblanc@rleblanc-pc:~/Documents/Work/Scripts/ib$<br>
> ><br>
> LD_PRELOAD=/usr/lib/umad2sim/libumad2sim.so /usr/sbin/ibtracert -G -n 0x0002c90300ebbb62 0x0013970301001f4c 2> /dev/null<br>
> > /usr/sbin/ibtracert: iberror: failed: can't resolve<br>
> source<br>
> > port 0x0002c90300ebbb62<br>
> > leblanc@rleblanc-pc:~/Documents/Work/Scripts/ib$<br>
> ><br>
> LD_PRELOAD=/usr/lib/umad2sim/libumad2sim.so /usr/sbin/ibtracert -G -n 0x0002c90300ebbb62 0x0013970301001f4c 2> /dev/null<br>
> > From {0x0002c90300ebbb60}[2]<br>
> > [2] -> {0x0002c90300684e30}[19]<br>
> > [2] -> {0x0002c90200431eb8}[10]<br>
> > [33] -> {0x001397010a000044}[10]<br>
> > [35] -> {0x0013970301001f4c}[1]<br>
> > To {0x0013970301001f4b}[1]<br>
> > leblanc@rleblanc-pc:~/Documents/Work/Scripts/ib$<br>
> ><br>
> ><br>
> > I'm attaching our topo file that we are using and<br>
> the opensm<br>
> > logs (you should be able to replicate the problem<br>
> given this<br>
> > information or tell me what I'm doing wrong).<br>
> ><br>
> ><br>
> > Thanks,<br>
> ><br>
> ><br>
> ><br>
> > Robert LeBlanc<br>
> > OIT Infrastructure & Virtualization Engineer<br>
> > Brigham Young University<br>
> ><br>
> ><br>
> ><br>
> > On Tue, Oct 22, 2013 at 10:55 PM, Hal Rosenstock<br>
> > <<a href="mailto:hal.rosenstock@gmail.com">hal.rosenstock@gmail.com</a>> wrote:<br>
> > ibsim just simulates the network (topology,<br>
> SMAs, and<br>
> > PMAs). OpenSM configured the subnet<br>
> including the<br>
> > routing (LFTs and MFTs) based on the routing<br>
> > algorithm. It is possible in a topology that<br>
> multiple<br>
> > routing algorithms yield the same routes.<br>
> More<br>
> > specifics would be needed to comment<br>
> "deeper"...<br>
> ><br>
> > -- Hal<br>
> ><br>
> ><br>
> > On Tue, Oct 22, 2013 at 6:38 PM, Robert<br>
> LeBlanc<br>
> > <<a href="mailto:robert_leblanc@byu.edu">robert_leblanc@byu.edu</a>> wrote:<br>
> ><br>
> > I'm trying to test routing in ibsim,<br>
> but it<br>
> > doesn't seem to update the routing<br>
> tables in<br>
> > the simulated switches. If I take a<br>
> link down<br>
> > using the clear command in ibsim, I<br>
> see opensm<br>
> > saying that it is updating the<br>
> routing tables<br>
> > and that it completes, but I can't<br>
> ibtracert<br>
> > to the LID who's path was taken<br>
> down.<br>
> ><br>
> ><br>
> > Should ibsim and opensm be<br>
> reconfiguring<br>
> > routing in the simulated<br>
> environment? No<br>
> > matter which routing protocol I<br>
> select in<br>
> > opensm, the routes are always the<br>
> same, even<br>
> > having opensm re-LID the entire<br>
> fabric doesn't<br>
> > help. Any help would be appreciated.<br>
> ><br>
> ><br>
> > Output from opensm:<br>
> ><br>
> ><br>
> ><br>
> ******************************************************************<br>
> > ***** LID ASSIGNMENT COMPLETE -<br>
> STARTING<br>
> > SWITCH TABLE CONFIG *****<br>
> ><br>
> ******************************************************************<br>
> ><br>
> ><br>
> ><br>
> ><br>
> > Oct 22 16:27:20 330198 [8437A700]<br>
> 0x04 -><br>
> > osm_ucast_mgr_build_lid_matrices:<br>
> Starting<br>
> > switches' Min Hop Table Assignment<br>
> > Oct 22 16:27:20 330954 [8437A700]<br>
> 0x02 -><br>
> > osm_ucast_mgr_process: minhop tables<br>
> > configured on all switches<br>
> > Oct 22 16:27:20 331191 [8437A700]<br>
> 0x04 -><br>
> > do_sweep:<br>
> ><br>
> ><br>
> ><br>
> ><br>
> ><br>
> ******************************************************************<br>
> > **************** SWITCHES CONFIGURED<br>
> FOR<br>
> > UNICAST *****************<br>
> ><br>
> ******************************************************************<br>
> ><br>
> ><br>
> ><br>
> ><br>
> > Thanks,<br>
> ><br>
> ><br>
> > Robert LeBlanc<br>
> > OIT Infrastructure & Virtualization<br>
> Engineer<br>
> > Brigham Young University<br>
> ><br>
> ><br>
> ><br>
> _______________________________________________<br>
> > Users mailing list<br>
> > <a href="mailto:Users@lists.openfabrics.org">Users@lists.openfabrics.org</a><br>
> ><br>
> <a href="http://lists.openfabrics.org/cgi-bin/mailman/listinfo/users" target="_blank">http://lists.openfabrics.org/cgi-bin/mailman/listinfo/users</a><br>
> ><br>
> ><br>
> ><br>
> ><br>
> ><br>
> ><br>
> ><br>
> > _______________________________________________<br>
> > Users mailing list<br>
> > <a href="mailto:Users@lists.openfabrics.org">Users@lists.openfabrics.org</a><br>
> > <a href="http://lists.openfabrics.org/cgi-bin/mailman/listinfo/users" target="_blank">http://lists.openfabrics.org/cgi-bin/mailman/listinfo/users</a><br>
><br>
> --<br>
> Albert Chu<br>
> <a href="mailto:chu11@llnl.gov">chu11@llnl.gov</a><br>
> Computer Scientist<br>
> High Performance Systems Division<br>
> Lawrence Livermore National Laboratory<br>
><br>
><br>
><br>
><br>
--<br>
Albert Chu<br>
<a href="mailto:chu11@llnl.gov">chu11@llnl.gov</a><br>
Computer Scientist<br>
High Performance Systems Division<br>
Lawrence Livermore National Laboratory<br>
<br>
<br>
</div></div></blockquote></div><br></div>