[Users] Weird IPoIB issue

Narayan Desai narayan.desai at gmail.com
Sun Oct 27 20:10:29 PDT 2013


Have you set a rate for ipoib in opensm? (specifically in partitions.conf)
This controls the speed of the multicast group associated with IPoIB. I've
seen issues when a node can't satisfy that rate; that will cause it not to
work properly on IPoIB, but still show good connectivity on raw IB. (This
is what your comments suggest; ibtracert works, but IPoIB doesn't)
Considering your mix of link speeds, I bet this (or something like it) is
it.

I'm not sure that tcpdump would ever show you anything useful for IPoIB,
when it isn't working at that low of a level. Can anyone say for sure?
-nld


On Sun, Oct 27, 2013 at 8:46 PM, Robert LeBlanc <robert_leblanc at byu.edu>wrote:

> Since you guys are amazingly helpful, I thought I would pick your brains
> in a new problem.
>
> We have two Xsigo directors cross connected to four Mellanox IS5030
> switches. Connected to those we have four Dell m1000e chassis each with two
> IB switches (two chassis have QDR and two have FDR10). We have 9 dual-port
> rack servers connected to the IS5030 switches. For testing purposes we have
> an additional Dell m1000e QDR chassis connected to one Xsigo director and
> two dual-port FDR10 rack servers connected to the other Xsigo director.
>
> I can get IPoIB to work between the two test rack servers connected to the
> one Xsigo director. But I can not get IPoIB to work between any blades
> either right next to each other or to the working rack servers. I'm using
> the same exact live CentOS ISO on all four servers. I've checked opensm and
> the blades have joined the multicast group 0xc000 properly. tcpdump
> basically says that traffic is not leaving the blades. tcpdump also shows
> no traffic entering the blades from the rack servers. An ibtracert using
> 0xc000 mlid shows that routing exists between hosts.
>
> I've read about MulticastFDBTop=0xBFFF but I don't know how to set it and
> I doubt it would have been set by default.
>
> Anyone have some ideas on troubleshooting steps to try? I think Google is
> tired of me asking questions about it.
>
> Thanks,
>
> Robert LeBlanc
> OIT Infrastructure & Virtualization Engineer
> Brigham Young University
>
> _______________________________________________
> Users mailing list
> Users at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/users/attachments/20131027/2a4f28a3/attachment.html>


More information about the Users mailing list