<div dir="ltr"><div>Which mystery is explained ? The 10 Gbps is a multicast only limit and does not apply to unicast. The BW limitation you're seeing is due to other factors. There's been much written about IPoIB performance.</div>
<div> </div><div>If all the MC members are joined and routed, then the IPoIB connectivity issue is some other issue. Are you sure this is the case ? Did you walk the route between 2 nodes where you have a connectivity issue ?</div>
</div><div class="gmail_extra"><br><br><div class="gmail_quote">On Mon, Oct 28, 2013 at 1:58 PM, Robert LeBlanc <span dir="ltr"><<a href="mailto:robert_leblanc@byu.edu" target="_blank">robert_leblanc@byu.edu</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Well, that explains one mystery, now I need to figure out why it seems the Dell blades are not passing the traffic.</div>
<div class="gmail_extra"><div class="im"><br clear="all"><div><div><span style="font-family:arial,sans-serif;font-size:13px"><br>
</span></div><span style="font-family:arial,sans-serif;font-size:13px">Robert LeBlanc</span><br style="font-family:arial,sans-serif;font-size:13px"><span style="font-family:arial,sans-serif;font-size:13px">OIT Infrastructure & Virtualization Engineer</span><br style="font-family:arial,sans-serif;font-size:13px">
<span style="font-family:arial,sans-serif;font-size:13px">Brigham Young University</span></div>
<br><br></div><div><div class="h5"><div class="gmail_quote">On Mon, Oct 28, 2013 at 11:51 AM, Hal Rosenstock <span dir="ltr"><<a href="mailto:hal.rosenstock@gmail.com" target="_blank">hal.rosenstock@gmail.com</a>></span> wrote:<br>
<blockquote style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid" class="gmail_quote">
<div dir="ltr"><div>Yes, that's the IPoIB IPv4 broadcast group for the default (0xffff) partition. 0x80 part of mtu and rate just means "is equal to". mtu 0x04 is 2K (2048) and rate 0x3 is 10 Gb/sec. These are indeed the defaults.</div>
</div><div><div><div class="gmail_extra"><br><br><div class="gmail_quote">On Mon, Oct 28, 2013 at 1:45 PM, Robert LeBlanc <span dir="ltr"><<a href="mailto:robert_leblanc@byu.edu" target="_blank">robert_leblanc@byu.edu</a>></span> wrote:<br>
<blockquote style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid" class="gmail_quote"><div dir="ltr">The info for that MGID is:<div><div><font face="courier new, monospace">MCMemberRecord group dump:</font></div>
<div><font face="courier new, monospace"> MGID....................ff12:401b:ffff::ffff:ffff</font></div>
<div><font face="courier new, monospace"> Mlid....................0xC000</font></div><div><font face="courier new, monospace"> Mtu.....................0x84</font></div><div><font face="courier new, monospace"> pkey....................0xFFFF</font></div>
<div><font face="courier new, monospace"> Rate....................0x83</font></div><div><font face="courier new, monospace"> SL......................0x0</font></div></div><div><br></div><div>
I don't understand the MTU and Rate (130 and 131 dec). When I run iperf between the two hosts over IPoIB in connected mode and MTU 65520. I've tried multiple threads, but the sum is still 10 Gbps.</div></div><div class="gmail_extra">
<div>
<br clear="all"><div><div><span style="font-family:arial,sans-serif;font-size:13px"><br></span></div><span style="font-family:arial,sans-serif;font-size:13px">Robert LeBlanc</span><br style="font-family:arial,sans-serif;font-size:13px">
<span style="font-family:arial,sans-serif;font-size:13px">OIT Infrastructure & Virtualization Engineer</span><br style="font-family:arial,sans-serif;font-size:13px">
<span style="font-family:arial,sans-serif;font-size:13px">Brigham Young University</span></div>
<br><br></div><div><div><div class="gmail_quote">On Mon, Oct 28, 2013 at 11:40 AM, Hal Rosenstock <span dir="ltr"><<a href="mailto:hal.rosenstock@gmail.com" target="_blank">hal.rosenstock@gmail.com</a>></span> wrote:<br>
<blockquote style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid" class="gmail_quote">
<div dir="ltr"><div><div>saquery -g should show what MGID is mapped to MLID 0xc000 and the group parameters.</div><div> </div></div><div>When you say 10 Gbps max, is that multicast or unicast ? That limit is only on the multicast.</div>
</div><div><div><div class="gmail_extra"><br><br><div class="gmail_quote">On Mon, Oct 28, 2013 at 1:28 PM, Robert LeBlanc <span dir="ltr"><<a href="mailto:robert_leblanc@byu.edu" target="_blank">robert_leblanc@byu.edu</a>></span> wrote:<br>
<blockquote style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid" class="gmail_quote"><div dir="ltr">Well, that can explain why I'm only able to get 10 Gbps max from the two hosts that are working.<div>
<br></div><div>I have tried updn and dnup and they didn't help either. I think the only thing that will help is Automatic Path Migration is it tries very hard to route the alternative LIDs through different systemguids. I suspect it would require re-LIDing everything which would mean an outage. I'm still trying to get answers from Oracle if that is even a possibility. I've tried seeding some of the algorithms with information like root nodes, etc, but none of them worked better.</div>
<div><br></div><div>The MLID 0xc000 exists and I can see all the nodes joined to the group using saquery. I've checked the route using ibtracert specifying the MLID. The only thing I'm not sure how to check is the group parameters. What tool would I use for that?</div>
<div><br></div><div>Thanks,</div></div><div class="gmail_extra"><div><br clear="all"><div><div><span style="font-family:arial,sans-serif;font-size:13px"><br></span></div><span style="font-family:arial,sans-serif;font-size:13px">Robert LeBlanc</span><br style="font-family:arial,sans-serif;font-size:13px">
<span style="font-family:arial,sans-serif;font-size:13px">OIT Infrastructure & Virtualization Engineer</span><br style="font-family:arial,sans-serif;font-size:13px">
<span style="font-family:arial,sans-serif;font-size:13px">Brigham Young University</span></div>
<br><br></div><div><div><div class="gmail_quote">On Mon, Oct 28, 2013 at 11:16 AM, Hal Rosenstock <span dir="ltr"><<a href="mailto:hal.rosenstock@gmail.com" target="_blank">hal.rosenstock@gmail.com</a>></span> wrote:<br>
<blockquote style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid" class="gmail_quote">
<div dir="ltr"><div>Xsigo's SM is not "straight" OpenSM. They have some proprietary enhancements and it may be based on old vintage of OpenSM. You will likely need to work with them/Oracle now on issues.</div>
<div> </div><div>Lack of a partitions file does mean default partition and default rate (10 Gbps) so from what I saw all ports had sufficient rate to join MC group.</div><div> </div><div>There are certain topology requirements for running various routing algorithms. Did you try updn or dnup ?</div>
<div> </div><div>The key is determining whether the IPoIB broadcast group is setup correctly. What MLID is the group built on (usually 0xc000) ? What are the group parameters (rate, MTU) ? Are all members that are running IPoIB joined ? Is the group routed to all such members ? There are infiniband-diags for all of this.</div>
</div><div><div><div class="gmail_extra"><br><br><div class="gmail_quote">On Mon, Oct 28, 2013 at 12:19 PM, Robert LeBlanc <span dir="ltr"><<a href="mailto:robert_leblanc@byu.edu" target="_blank">robert_leblanc@byu.edu</a>></span> wrote:<br>
<blockquote style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid" class="gmail_quote"><div dir="ltr">OpenSM (the SM runs on Xsigo so they manage it) is using minhop. I've loaded the ibnetdiscover output into ibsim and run all the different routing algorithms against it with and without scatter ports. Minhop had 50% of our hosts running all paths through a single IS5030 switch (at least the LIDs we need which represent Ethernet and Fibre Channel cards the hosts should communicate with). Ftree, dor, and dfsssp failed back to minhop, the others routed more paths through the same IS5030 in some cases increasing our host count with single point of failure to 75%.<div>
<br></div><div>As far as I can tell there is no partitions.conf file so I assume we are using the default partition. There is an opensm.opts file, but it only specifies logging information.</div><div><div><font face="courier new, monospace"># SA database file name</font></div>
<div><font face="courier new, monospace">sa_db_file /var/log/opensm-sa.dump</font></div><div><font face="courier new, monospace"><br></font></div><div><font face="courier new, monospace"># If TRUE causes OpenSM to dump SA database at the end of</font></div>
<div><font face="courier new, monospace"># every light sweep, regardless of the verbosity level</font></div><div><font face="courier new, monospace">sa_db_dump TRUE</font></div><div><font face="courier new, monospace"><br>
</font></div><div><font face="courier new, monospace"># The directory to hold the file OpenSM dumps</font></div><div><font face="courier new, monospace">dump_files_dir /var/log/</font></div></div><div><br></div><div>The SM node is:</div>
<div><div><font face="courier new, monospace">xsigoa:/opt/xsigo/xsigos/current/ofed/etc# ibaddr</font></div><div><font face="courier new, monospace">GID fe80::13:9702:100:979 LID start 0x1 end 0x1</font></div></div><div>
<br>
</div><div>We do have Switch-X in two of the Dell m1000e chassis but the cards, ports 17-32, are FDR10 (the switch may be straight FDR, but I'm not 100% sure). The IS5030 are QDR which the Switch-X are connected to, the switches in the Xsigo directors are QDR, but the Ethernet and Fibre Channel cards are DDR. The DDR cards will not be running IPoIB (at least to my knowledge they don't have the ability), only the hosts should be leveraging IPoIB. I hope that clears up some of your questions. If you have more, I will try to answer them.<br>
<div><br></div><div><br></div></div></div><div class="gmail_extra"><div><br clear="all"><div><div><span style="font-family:arial,sans-serif;font-size:13px"><br></span></div><span style="font-family:arial,sans-serif;font-size:13px">Robert LeBlanc</span><br style="font-family:arial,sans-serif;font-size:13px">
<span style="font-family:arial,sans-serif;font-size:13px">OIT Infrastructure & Virtualization Engineer</span><br style="font-family:arial,sans-serif;font-size:13px">
<span style="font-family:arial,sans-serif;font-size:13px">Brigham Young University</span></div>
<br><br></div><div><div><div class="gmail_quote">On Mon, Oct 28, 2013 at 9:57 AM, Hal Rosenstock <span dir="ltr"><<a href="mailto:hal.rosenstock@gmail.com" target="_blank">hal.rosenstock@gmail.com</a>></span> wrote:<br>
<blockquote style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid" class="gmail_quote">
<div dir="ltr"><div>What routing algorithm is configured in OpenSM ? What does your partitions.conf file look like ? Which node is your OpenSM ?</div><div> </div><div>Also, I only see QDR and DDR links although you have Switch-X so I assume all FDR ports are connected to slower (QDR) devices. I don't see any FDR-10 ports but maybe they're also connected to QDR ports so show up as QDR in the topology.</div>
<div> </div><div>There are DDR CAs in Xsigo box but not sure whether or not they run IPoIB.</div><span><font color="#888888"><div> </div><div>-- Hal</div></font></span></div><div class="gmail_extra"><br><br>
<div class="gmail_quote"><div>On Sun, Oct 27, 2013 at 9:46 PM, Robert LeBlanc <span dir="ltr"><<a href="mailto:robert_leblanc@byu.edu" target="_blank">robert_leblanc@byu.edu</a>></span> wrote:<br>
</div><blockquote style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid" class="gmail_quote"><div><div><div dir="ltr">Since you guys are amazingly helpful, I thought I would pick your brains in a new problem.<div>
<br></div>
<div>We have two Xsigo directors cross connected to four Mellanox IS5030 switches. Connected to those we have four Dell m1000e chassis each with two IB switches (two chassis have QDR and two have FDR10). We have 9 dual-port rack servers connected to the IS5030 switches. For testing purposes we have an additional Dell m1000e QDR chassis connected to one Xsigo director and two dual-port FDR10 rack servers connected to the other Xsigo director.</div>
<div><br></div><div>I can get IPoIB to work between the two test rack servers connected to the one Xsigo director. But I can not get IPoIB to work between any blades either right next to each other or to the working rack servers. I'm using the same exact live CentOS ISO on all four servers. I've checked opensm and the blades have joined the multicast group 0xc000 properly. tcpdump basically says that traffic is not leaving the blades. tcpdump also shows no traffic entering the blades from the rack servers. An ibtracert using 0xc000 mlid shows that routing exists between hosts.</div>
<div><br></div><div>I've read about MulticastFDBTop=0xBFFF but I don't know how to set it and I doubt it would have been set by default.</div><div><br></div><div>Anyone have some ideas on troubleshooting steps to try? I think Google is tired of me asking questions about it.</div>
<div><br></div><div>Thanks,<br clear="all"><div><div><span style="font-family:arial,sans-serif;font-size:13px"><br></span></div><span style="font-family:arial,sans-serif;font-size:13px">Robert LeBlanc</span><br style="font-family:arial,sans-serif;font-size:13px">
<span style="font-family:arial,sans-serif;font-size:13px">OIT Infrastructure & Virtualization Engineer</span><br style="font-family:arial,sans-serif;font-size:13px">
<span style="font-family:arial,sans-serif;font-size:13px">Brigham Young University</span></div>
</div></div>
<br></div></div><div>_______________________________________________<br>
Users mailing list<br>
<a href="mailto:Users@lists.openfabrics.org" target="_blank">Users@lists.openfabrics.org</a><br>
<a href="http://lists.openfabrics.org/cgi-bin/mailman/listinfo/users" target="_blank">http://lists.openfabrics.org/cgi-bin/mailman/listinfo/users</a><br>
<br></div></blockquote></div><br></div>
</blockquote></div><br></div></div></div>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div></div></div>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div></div></div>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div></div></div>
</blockquote></div><br></div>