[Users] Weird IPoIB issue

Robert LeBlanc robert_leblanc at byu.edu
Mon Oct 28 10:45:55 PDT 2013


The info for that MGID is:
MCMemberRecord group dump:
                MGID....................ff12:401b:ffff::ffff:ffff
                Mlid....................0xC000
                Mtu.....................0x84
                pkey....................0xFFFF
                Rate....................0x83
                SL......................0x0

I don't understand the MTU and Rate (130 and 131 dec). When I run iperf
between the two hosts over IPoIB in connected mode and MTU 65520. I've
tried multiple threads, but the sum is still 10 Gbps.


Robert LeBlanc
OIT Infrastructure & Virtualization Engineer
Brigham Young University


On Mon, Oct 28, 2013 at 11:40 AM, Hal Rosenstock
<hal.rosenstock at gmail.com>wrote:

> saquery -g should show what MGID is mapped to MLID 0xc000 and the group
> parameters.
>
> When you say 10 Gbps max, is that multicast or unicast ? That limit is
> only on the multicast.
>
>
> On Mon, Oct 28, 2013 at 1:28 PM, Robert LeBlanc <robert_leblanc at byu.edu>wrote:
>
>> Well, that can explain why I'm only able to get 10 Gbps max from the two
>> hosts that are working.
>>
>> I have tried updn and dnup and they didn't help either. I think the only
>> thing that will help is Automatic Path Migration is it tries very hard to
>> route the alternative LIDs through different systemguids. I suspect it
>> would require re-LIDing everything which would mean an outage. I'm still
>> trying to get answers from Oracle if that is even a possibility. I've tried
>> seeding some of the algorithms with information like root nodes, etc, but
>> none of them worked better.
>>
>> The MLID 0xc000 exists and I can see all the nodes joined to the group
>> using saquery. I've checked the route using ibtracert specifying the MLID.
>> The only thing I'm not sure how to check is the group parameters. What tool
>> would I use for that?
>>
>> Thanks,
>>
>>
>> Robert LeBlanc
>> OIT Infrastructure & Virtualization Engineer
>> Brigham Young University
>>
>>
>> On Mon, Oct 28, 2013 at 11:16 AM, Hal Rosenstock <
>> hal.rosenstock at gmail.com> wrote:
>>
>>> Xsigo's SM is not "straight" OpenSM. They have some proprietary
>>> enhancements and it may be based on old vintage of OpenSM. You will likely
>>> need to work with them/Oracle now on issues.
>>>
>>> Lack of a partitions file does mean default partition and default rate
>>> (10 Gbps) so from what I saw all ports had sufficient rate to join MC group.
>>>
>>> There are certain topology requirements for running various routing
>>> algorithms. Did you try updn or dnup ?
>>>
>>> The key is determining whether the IPoIB broadcast group is setup
>>> correctly. What MLID is the group built on (usually 0xc000) ? What are the
>>> group parameters (rate, MTU) ? Are all members that are running IPoIB
>>> joined ? Is the group routed to all such members ? There are
>>> infiniband-diags for all of this.
>>>
>>>
>>> On Mon, Oct 28, 2013 at 12:19 PM, Robert LeBlanc <robert_leblanc at byu.edu
>>> > wrote:
>>>
>>>> OpenSM (the SM runs on Xsigo so they manage it) is using minhop. I've
>>>> loaded the ibnetdiscover output into ibsim and run all the different
>>>> routing algorithms against it with and without scatter ports. Minhop had
>>>> 50% of our hosts running all paths through a single IS5030 switch (at least
>>>> the LIDs we need which represent Ethernet and Fibre Channel cards the hosts
>>>> should communicate with). Ftree, dor, and dfsssp failed back to minhop, the
>>>> others routed more paths through the same IS5030 in some cases increasing
>>>> our host count with single point of failure to 75%.
>>>>
>>>> As far as I can tell there is no partitions.conf file so I assume we
>>>> are using the default partition. There is an opensm.opts file, but it only
>>>> specifies logging information.
>>>> # SA database file name
>>>> sa_db_file /var/log/opensm-sa.dump
>>>>
>>>> # If TRUE causes OpenSM to dump SA database at the end of
>>>> # every light sweep, regardless of the verbosity level
>>>> sa_db_dump TRUE
>>>>
>>>> # The directory to hold the file OpenSM dumps
>>>> dump_files_dir /var/log/
>>>>
>>>> The SM node is:
>>>> xsigoa:/opt/xsigo/xsigos/current/ofed/etc# ibaddr
>>>> GID fe80::13:9702:100:979 LID start 0x1 end 0x1
>>>>
>>>> We do have Switch-X in two of the Dell m1000e chassis but the cards,
>>>> ports 17-32, are FDR10 (the switch may be straight FDR, but I'm not 100%
>>>> sure). The IS5030 are QDR which the Switch-X are connected to, the switches
>>>> in the Xsigo directors are QDR, but the Ethernet and Fibre Channel cards
>>>> are DDR. The DDR cards will not be running IPoIB (at least to my knowledge
>>>> they don't have the ability), only the hosts should be leveraging IPoIB. I
>>>> hope that clears up some of your questions. If you have more, I will try to
>>>> answer them.
>>>>
>>>>
>>>>
>>>>
>>>> Robert LeBlanc
>>>> OIT Infrastructure & Virtualization Engineer
>>>> Brigham Young University
>>>>
>>>>
>>>> On Mon, Oct 28, 2013 at 9:57 AM, Hal Rosenstock <
>>>> hal.rosenstock at gmail.com> wrote:
>>>>
>>>>> What routing algorithm is configured in OpenSM ? What does your
>>>>> partitions.conf file look like ? Which node is your OpenSM ?
>>>>>
>>>>> Also, I only see QDR and DDR links although you have Switch-X so I
>>>>> assume all FDR ports are connected to slower (QDR) devices. I don't see any
>>>>> FDR-10 ports but maybe they're also connected to QDR ports so show up as
>>>>> QDR in the topology.
>>>>>
>>>>> There are DDR CAs in Xsigo box but not sure whether or not they run
>>>>> IPoIB.
>>>>>
>>>>> -- Hal
>>>>>
>>>>>
>>>>> On Sun, Oct 27, 2013 at 9:46 PM, Robert LeBlanc <
>>>>> robert_leblanc at byu.edu> wrote:
>>>>>
>>>>>> Since you guys are amazingly helpful, I thought I would pick your
>>>>>> brains in a new problem.
>>>>>>
>>>>>> We have two Xsigo directors cross connected to four Mellanox IS5030
>>>>>> switches. Connected to those we have four Dell m1000e chassis each with two
>>>>>> IB switches (two chassis have QDR and two have FDR10). We have 9 dual-port
>>>>>> rack servers connected to the IS5030 switches. For testing purposes we have
>>>>>> an additional Dell m1000e QDR chassis connected to one Xsigo director and
>>>>>> two dual-port FDR10 rack servers connected to the other Xsigo director.
>>>>>>
>>>>>> I can get IPoIB to work between the two test rack servers connected
>>>>>> to the one Xsigo director. But I can not get IPoIB to work between any
>>>>>> blades either right next to each other or to the working rack servers. I'm
>>>>>> using the same exact live CentOS ISO on all four servers. I've checked
>>>>>> opensm and the blades have joined the multicast group 0xc000 properly.
>>>>>> tcpdump basically says that traffic is not leaving the blades. tcpdump also
>>>>>> shows no traffic entering the blades from the rack servers. An ibtracert
>>>>>> using 0xc000 mlid shows that routing exists between hosts.
>>>>>>
>>>>>> I've read about MulticastFDBTop=0xBFFF but I don't know how to set it
>>>>>> and I doubt it would have been set by default.
>>>>>>
>>>>>> Anyone have some ideas on troubleshooting steps to try? I think
>>>>>> Google is tired of me asking questions about it.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Robert LeBlanc
>>>>>> OIT Infrastructure & Virtualization Engineer
>>>>>> Brigham Young University
>>>>>>
>>>>>> _______________________________________________
>>>>>> Users mailing list
>>>>>> Users at lists.openfabrics.org
>>>>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/users
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/users/attachments/20131028/cf4a89bc/attachment.html>


More information about the Users mailing list