<div dir="ltr">Have you set a rate for ipoib in opensm? (specifically in partitions.conf) This controls the speed of the multicast group associated with IPoIB. I've seen issues when a node can't satisfy that rate; that will cause it not to work properly on IPoIB, but still show good connectivity on raw IB. (This is what your comments suggest; ibtracert works, but IPoIB doesn't) Considering your mix of link speeds, I bet this (or something like it) is it.<div>
<br></div><div>I'm not sure that tcpdump would ever show you anything useful for IPoIB, when it isn't working at that low of a level. Can anyone say for sure? </div><div>-nld</div></div><div class="gmail_extra"><br>
<br><div class="gmail_quote">On Sun, Oct 27, 2013 at 8:46 PM, Robert LeBlanc <span dir="ltr"><<a href="mailto:robert_leblanc@byu.edu" target="_blank">robert_leblanc@byu.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">Since you guys are amazingly helpful, I thought I would pick your brains in a new problem.<div><br></div><div>We have two Xsigo directors cross connected to four Mellanox IS5030 switches. Connected to those we have four Dell m1000e chassis each with two IB switches (two chassis have QDR and two have FDR10). We have 9 dual-port rack servers connected to the IS5030 switches. For testing purposes we have an additional Dell m1000e QDR chassis connected to one Xsigo director and two dual-port FDR10 rack servers connected to the other Xsigo director.</div>
<div><br></div><div>I can get IPoIB to work between the two test rack servers connected to the one Xsigo director. But I can not get IPoIB to work between any blades either right next to each other or to the working rack servers. I'm using the same exact live CentOS ISO on all four servers. I've checked opensm and the blades have joined the multicast group 0xc000 properly. tcpdump basically says that traffic is not leaving the blades. tcpdump also shows no traffic entering the blades from the rack servers. An ibtracert using 0xc000 mlid shows that routing exists between hosts.</div>
<div><br></div><div>I've read about MulticastFDBTop=0xBFFF but I don't know how to set it and I doubt it would have been set by default.</div><div><br></div><div>Anyone have some ideas on troubleshooting steps to try? I think Google is tired of me asking questions about it.</div>
<div><br></div><div>Thanks,<br clear="all"><div><div><span style="font-size:13px;font-family:arial,sans-serif"><br></span></div><span style="font-size:13px;font-family:arial,sans-serif">Robert LeBlanc</span><br style="font-size:13px;font-family:arial,sans-serif">
<span style="font-size:13px;font-family:arial,sans-serif">OIT Infrastructure & Virtualization Engineer</span><br style="font-size:13px;font-family:arial,sans-serif">
<span style="font-size:13px;font-family:arial,sans-serif">Brigham Young University</span></div>
</div></div>
<br>_______________________________________________<br>
Users mailing list<br>
<a href="mailto:Users@lists.openfabrics.org">Users@lists.openfabrics.org</a><br>
<a href="http://lists.openfabrics.org/cgi-bin/mailman/listinfo/users" target="_blank">http://lists.openfabrics.org/cgi-bin/mailman/listinfo/users</a><br>
<br></blockquote></div><br></div>