[Users] Weird IPoIB issue

Hal Rosenstock hal.rosenstock at gmail.com
Wed Oct 30 11:28:02 PDT 2013


 Determine LID of switch (in the below say switch is lid x)
Then:

smpquery si x
(of interest are McastFdbCap and MulticastFDBTop)
smpquery pi x 0
(of interest is CapMask)
ibroute -M x



On Tue, Oct 29, 2013 at 3:56 PM, Robert LeBlanc <robert_leblanc at byu.edu>wrote:

> Both ports show up in the "saquery MCMR" results with a JoinState of 0x1.
>
> How can I dump the parameters of a non-managed switch so that I can
> confirm that multicast is not turned off on the Dell chassis IB switches?
>
>
> Robert LeBlanc
> OIT Infrastructure & Virtualization Engineer
> Brigham Young University
>
>
> On Mon, Oct 28, 2013 at 5:04 PM, Coulter, Susan K <skc at lanl.gov> wrote:
>
>>
>>  /sys/class/net should give you the details on your devices, like this:
>>
>>  -bash-4.1# cd /sys/class/net
>> -bash-4.1# ls -l
>> total 0
>> lrwxrwxrwx 1 root root 0 Oct 23 12:59 eth0 ->
>> ../../devices/pci0000:00/0000:00:02.0/0000:04:00.0/net/eth0
>> lrwxrwxrwx 1 root root 0 Oct 23 12:59 eth1 ->
>> ../../devices/pci0000:00/0000:00:02.0/0000:04:00.1/net/eth1
>> lrwxrwxrwx 1 root root 0 Oct 23 15:42 ib0 ->
>> ../../devices/pci0000:40/0000:40:0c.0/0000:47:00.0/net/ib0
>> lrwxrwxrwx 1 root root 0 Oct 23 15:42 ib1 ->
>> ../../devices/pci0000:40/0000:40:0c.0/0000:47:00.0/net/ib1
>> lrwxrwxrwx 1 root root 0 Oct 23 15:42 ib2 ->
>> ../../devices/pci0000:c0/0000:c0:0c.0/0000:c7:00.0/net/ib2
>> lrwxrwxrwx 1 root root 0 Oct 23 15:42 ib3 ->
>> ../../devices/pci0000:c0/0000:c0:0c.0/0000:c7:00.0/net/ib3
>>
>>  Then use "lspci | grep Mell"  to get the pci device numbers.
>>
>>  47:00.0 Network controller: Mellanox Technologies MT26428 [ConnectX VPI
>> PCIe 2.0 5GT/s - IB QDR / 10GigE] (rev b0)
>> c7:00.0 Network controller: Mellanox Technologies MT26428 [ConnectX VPI
>> PCIe 2.0 5GT/s - IB QDR / 10GigE] (rev b0)
>>
>>  In this example, ib0 and 1 are referencing the device at  47:00.0
>> And ib2 and ib3 are referencing the device at c7:00.0
>>
>>  That said, if you only have one card - this is probably not the problem.
>> Additionally, since the arp requests are being seen going out ib0, your
>> emulation appears to be working.
>>
>>  If those arp requests are not being seen on the other end, it seems
>> like a problem with the mgids.
>> Like maybe the port you are trying to reach is not in the IPoIB multicast
>> group?
>>
>>  You can look at all the multicast member records with "saquery MCMR".
>> Or - you can grep for mcmr_rcv_join_mgrp references in your SM logs …
>>
>>  HTH
>>
>>
>>
>>  On Oct 28, 2013, at 1:08 PM, Robert LeBlanc <robert_leblanc at byu.edu>
>> wrote:
>>
>>  I can ibping between both hosts just fine.
>>
>>  [root at desxi003 ~]# ibping 0x37
>> Pong from desxi004.(none) (Lid 55): time 0.111 ms
>> Pong from desxi004.(none) (Lid 55): time 0.189 ms
>> Pong from desxi004.(none) (Lid 55): time 0.189 ms
>> Pong from desxi004.(none) (Lid 55): time 0.179 ms
>> ^C
>> --- desxi004.(none) (Lid 55) ibping statistics ---
>> 4 packets transmitted, 4 received, 0% packet loss, time 3086 ms
>> rtt min/avg/max = 0.111/0.167/0.189 ms
>>
>>  [root at desxi004 ~]# ibping 0x2d
>> Pong from desxi003.(none) (Lid 45): time 0.156 ms
>> Pong from desxi003.(none) (Lid 45): time 0.175 ms
>> Pong from desxi003.(none) (Lid 45): time 0.176 ms
>> ^C
>> --- desxi003.(none) (Lid 45) ibping statistics ---
>> 3 packets transmitted, 3 received, 0% packet loss, time 2302 ms
>> rtt min/avg/max = 0.156/0.169/0.176 ms
>>
>>  When I do an Ethernet ping to the IPoIB address, tcpdump only shows the
>> outgoing ARP request.
>>
>>  [root at desxi003 ~]# tcpdump -i ib0
>> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
>> listening on ib0, link-type LINUX_SLL (Linux cooked), capture size 65535
>> bytes
>> 19:00:08.950320 ARP, Request who-has 192.168.9.4 tell 192.168.9.3, length
>> 56
>> 19:00:09.950320 ARP, Request who-has 192.168.9.4 tell 192.168.9.3, length
>> 56
>> 19:00:10.950307 ARP, Request who-has 192.168.9.4 tell 192.168.9.3, length
>> 56
>>
>>  Running tcpdump on the rack servers I don't see the ARP request there
>> which I should.
>>
>>  From what I've read, ib0 should be mapped to the first port and ib1
>> should be mapped to the second port. We have one IB card with two ports.
>> The modprobe is the default installed with the Mellanox drivers.
>>
>>  [root at desxi003 etc]# cat modprobe.d/ib_ipoib.conf
>> # install ib_ipoib modprobe --ignore-install ib_ipoib &&
>> /sbin/ib_ipoib_sysctl load
>> # remove ib_ipoib /sbin/ib_ipoib_sysctl unload ; modprobe -r
>> --ignore-remove ib_ipoib
>> alias ib0 ib_ipoib
>> alias ib1 ib_ipoib
>>
>>  Can you give me some pointers on digging into the device layer to make
>> sure IPoIB is connected correctly? Would I look in /sys or /proc for that?
>>
>>  Dell has not been able to replicate the problem in their environment
>> and they only support Red Hat and won't work with my CentOS live CD. These
>> blades don't have internal hard drives so it makes it hard to install any
>> OS. I don't know if I can engage Mellanox since they build the switch
>> hardware and driver stack we are using.
>>
>>  I really appreciate all the help you guys have given thus far, I'm
>> learning a lot as this progresses. I'm reading through
>> https://tools.ietf.org/html/rfc4391 trying to understand IPoIB from top
>> to bottom.
>>
>>  Thanks,
>>
>>
>>  Robert LeBlanc
>> OIT Infrastructure & Virtualization Engineer
>> Brigham Young University
>>
>>
>> On Mon, Oct 28, 2013 at 12:53 PM, Coulter, Susan K <skc at lanl.gov> wrote:
>>
>>>
>>>  If you are not seeing any packets leave the ib0 interface, it sounds
>>> like the emulation layer is not connected to the right device.
>>>
>>>  If ib_ipoib kernel module is loaded, and a simple native IB test works
>>> between those blades - (like ib_read_bw) you need to dig into the device
>>> layer and insure ipoib is "connected" to the right device.
>>>
>>>  Do you have more than 1 IB card?
>>> What does your modprobe config look like for ipoib?
>>>
>>>
>>>   On Oct 28, 2013, at 12:38 PM, Robert LeBlanc <robert_leblanc at byu.edu>
>>>   wrote:
>>>
>>>  These ESX hosts (2 blade server and 2 rack servers) are booted into a
>>> CentOS 6.2 Live CD that I built. Right now everything I'm trying to get
>>> working is CentOS 6.2. All of our other hosts are running ESXi and have
>>> IPoIB interfaces, but none of them are configured and I'm not trying to get
>>> those working right now.
>>>
>>>  Ideally, we would like our ESX hosts to communicate with each other
>>> for vMotion and protected VM traffic as well as with our Commvault backup
>>> servers (Windows) over IPoIB (or Oracle's PVI which is very similar).
>>>
>>>
>>>  Robert LeBlanc
>>> OIT Infrastructure & Virtualization Engineer
>>> Brigham Young University
>>>
>>>
>>> On Mon, Oct 28, 2013 at 12:33 PM, Hal Rosenstock <
>>> hal.rosenstock at gmail.com> wrote:
>>>
>>>> Are those ESXi IPoIB interfaces ? Do some of these work and others not
>>>> ? Are there normal Linux IPoIB interfaces ? Do they work ?
>>>>
>>>>
>>>> On Mon, Oct 28, 2013 at 2:24 PM, Robert LeBlanc <robert_leblanc at byu.edu
>>>> > wrote:
>>>>
>>>>> Yes, I can not ping them over the IPoIB interface. It is a very simple
>>>>> network set-up.
>>>>>
>>>>>  desxi003
>>>>>  8: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520 qdisc pfifo_fast
>>>>> state UP qlen 256
>>>>>     link/infiniband
>>>>> 80:20:00:54:fe:80:00:00:00:00:00:00:f0:4d:a2:90:97:78:e7:d1 brd
>>>>> 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
>>>>>     inet 192.168.9.3/24 brd 192.168.9.255 scope global ib0
>>>>>     inet6 fe80::f24d:a290:9778:e7d1/64 scope link
>>>>>        valid_lft forever preferred_lft forever
>>>>>
>>>>>  desxi004
>>>>>  8: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520 qdisc pfifo_fast
>>>>> state UP qlen 256
>>>>>     link/infiniband
>>>>> 80:20:00:54:fe:80:00:00:00:00:00:00:f0:4d:a2:90:97:78:e7:15 brd
>>>>> 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
>>>>>     inet 192.168.9.4/24 brd 192.168.9.255 scope global ib0
>>>>>     inet6 fe80::f24d:a290:9778:e715/64 scope link
>>>>>        valid_lft forever preferred_lft forever
>>>>>
>>>>>
>>>>>
>>>>>  Robert LeBlanc
>>>>> OIT Infrastructure & Virtualization Engineer
>>>>> Brigham Young University
>>>>>
>>>>>
>>>>>  On Mon, Oct 28, 2013 at 12:22 PM, Hal Rosenstock <
>>>>> hal.rosenstock at gmail.com> wrote:
>>>>>
>>>>>> So these 2 hosts have trouble talking IPoIB to each other ?
>>>>>>
>>>>>>
>>>>>> On Mon, Oct 28, 2013 at 2:16 PM, Robert LeBlanc <
>>>>>> robert_leblanc at byu.edu> wrote:
>>>>>>
>>>>>>> I was just wondering about that. It seems reasonable that the
>>>>>>> broadcast traffic would go over multicast, but effectively channels would
>>>>>>> be created for node to node communication, otherwise the entire multicast
>>>>>>> group would be limited to 10 Gbps (in this instance) for the whole group.
>>>>>>> That doesn't scale very well.
>>>>>>>
>>>>>>>  The things I've read about IPoIB performance tuning seems pretty
>>>>>>> vague, and the changes most people recommend seem to be already in place on
>>>>>>> the systems I'm using. Some people said, try using a newer version of
>>>>>>> Ubuntu, but ultimately, I have very little control over VMware. Once I can
>>>>>>> get the Linux machines to communicate IPoIB between the racks and blades,
>>>>>>> then I'm going to turn my attention over to performance optimization. It
>>>>>>> doesn't seem to make much sense to spend time there when it is not working
>>>>>>> at all for most machines.
>>>>>>>
>>>>>>>  I've done ibtracert between the two nodes, is that what you mean
>>>>>>> by walking the route?
>>>>>>>
>>>>>>>  [root at desxi003 ~]# ibtracert -m 0xc000 0x2d 0x37
>>>>>>> From ca 0xf04da2909778e7d0 port 1 lid 45-45 "localhost HCA-1"
>>>>>>> [1] -> switch 0x2c90200448ec8[17] lid 51 "Infiniscale-IV Mellanox
>>>>>>> Technologies"
>>>>>>> [18] -> ca 0xf04da2909778e714[1] lid 55 "localhost HCA-1"
>>>>>>> To ca 0xf04da2909778e714 port 1 lid 55-55 "localhost HCA-1"
>>>>>>>
>>>>>>>  [root at desxi004 ~]# ibtracert -m 0xc000 0x37 0x2d
>>>>>>> From ca 0xf04da2909778e714 port 1 lid 55-55 "localhost HCA-1"
>>>>>>> [1] -> switch 0x2c90200448ec8[18] lid 51 "Infiniscale-IV Mellanox
>>>>>>> Technologies"
>>>>>>> [17] -> ca 0xf04da2909778e7d0[1] lid 45 "localhost HCA-1"
>>>>>>> To ca 0xf04da2909778e7d0 port 1 lid 45-45 "localhost HCA-1"
>>>>>>>
>>>>>>>  As you can see, the route is on the same switch, the blades are
>>>>>>> right next to each other.
>>>>>>>
>>>>>>>
>>>>>>>  Robert LeBlanc
>>>>>>> OIT Infrastructure & Virtualization Engineer
>>>>>>> Brigham Young University
>>>>>>>
>>>>>>>
>>>>>>>  On Mon, Oct 28, 2013 at 12:05 PM, Hal Rosenstock <
>>>>>>> hal.rosenstock at gmail.com> wrote:
>>>>>>>
>>>>>>>>  Which mystery is explained ? The 10 Gbps is a multicast only
>>>>>>>> limit and does not apply to unicast. The BW limitation you're seeing is due
>>>>>>>> to other factors. There's been much written about IPoIB performance.
>>>>>>>>
>>>>>>>> If all the MC members are joined and routed, then the IPoIB
>>>>>>>> connectivity issue is some other issue. Are you sure this is the case ? Did
>>>>>>>> you walk the route between 2 nodes where you have a connectivity issue ?
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Oct 28, 2013 at 1:58 PM, Robert LeBlanc <
>>>>>>>> robert_leblanc at byu.edu> wrote:
>>>>>>>>
>>>>>>>>> Well, that explains one mystery, now I need to figure out why it
>>>>>>>>> seems the Dell blades are not passing the traffic.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  Robert LeBlanc
>>>>>>>>> OIT Infrastructure & Virtualization Engineer
>>>>>>>>> Brigham Young University
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  On Mon, Oct 28, 2013 at 11:51 AM, Hal Rosenstock <
>>>>>>>>> hal.rosenstock at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>>  Yes, that's the IPoIB IPv4 broadcast group for the default
>>>>>>>>>> (0xffff) partition. 0x80 part of mtu and rate just means "is equal to". mtu
>>>>>>>>>> 0x04 is 2K (2048) and rate 0x3 is 10 Gb/sec. These are indeed the defaults.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Oct 28, 2013 at 1:45 PM, Robert LeBlanc <
>>>>>>>>>> robert_leblanc at byu.edu> wrote:
>>>>>>>>>>
>>>>>>>>>>> The info for that MGID is:
>>>>>>>>>>> MCMemberRecord group dump:
>>>>>>>>>>>                 MGID....................ff12:401b:ffff::ffff:ffff
>>>>>>>>>>>                 Mlid....................0xC000
>>>>>>>>>>>                 Mtu.....................0x84
>>>>>>>>>>>                 pkey....................0xFFFF
>>>>>>>>>>>                 Rate....................0x83
>>>>>>>>>>>                 SL......................0x0
>>>>>>>>>>>
>>>>>>>>>>>  I don't understand the MTU and Rate (130 and 131 dec). When I
>>>>>>>>>>> run iperf between the two hosts over IPoIB in connected mode and MTU 65520.
>>>>>>>>>>> I've tried multiple threads, but the sum is still 10 Gbps.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>  Robert LeBlanc
>>>>>>>>>>> OIT Infrastructure & Virtualization Engineer
>>>>>>>>>>> Brigham Young University
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>  On Mon, Oct 28, 2013 at 11:40 AM, Hal Rosenstock <
>>>>>>>>>>> hal.rosenstock at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>>  saquery -g should show what MGID is mapped to MLID 0xc000 and
>>>>>>>>>>>> the group parameters.
>>>>>>>>>>>>
>>>>>>>>>>>>  When you say 10 Gbps max, is that multicast or unicast ? That
>>>>>>>>>>>> limit is only on the multicast.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Oct 28, 2013 at 1:28 PM, Robert LeBlanc <
>>>>>>>>>>>> robert_leblanc at byu.edu> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Well, that can explain why I'm only able to get 10 Gbps max
>>>>>>>>>>>>> from the two hosts that are working.
>>>>>>>>>>>>>
>>>>>>>>>>>>>  I have tried updn and dnup and they didn't help either. I
>>>>>>>>>>>>> think the only thing that will help is Automatic Path Migration is it tries
>>>>>>>>>>>>> very hard to route the alternative LIDs through different systemguids. I
>>>>>>>>>>>>> suspect it would require re-LIDing everything which would mean an outage.
>>>>>>>>>>>>> I'm still trying to get answers from Oracle if that is even a possibility.
>>>>>>>>>>>>> I've tried seeding some of the algorithms with information like root nodes,
>>>>>>>>>>>>> etc, but none of them worked better.
>>>>>>>>>>>>>
>>>>>>>>>>>>>  The MLID 0xc000 exists and I can see all the nodes joined to
>>>>>>>>>>>>> the group using saquery. I've checked the route using ibtracert specifying
>>>>>>>>>>>>> the MLID. The only thing I'm not sure how to check is the group parameters.
>>>>>>>>>>>>> What tool would I use for that?
>>>>>>>>>>>>>
>>>>>>>>>>>>>  Thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>  Robert LeBlanc
>>>>>>>>>>>>> OIT Infrastructure & Virtualization Engineer
>>>>>>>>>>>>> Brigham Young University
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>  On Mon, Oct 28, 2013 at 11:16 AM, Hal Rosenstock <
>>>>>>>>>>>>> hal.rosenstock at gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>>  Xsigo's SM is not "straight" OpenSM. They have some
>>>>>>>>>>>>>> proprietary enhancements and it may be based on old vintage of OpenSM. You
>>>>>>>>>>>>>> will likely need to work with them/Oracle now on issues.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Lack of a partitions file does mean default partition and
>>>>>>>>>>>>>> default rate (10 Gbps) so from what I saw all ports had sufficient rate to
>>>>>>>>>>>>>> join MC group.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> There are certain topology requirements for running various
>>>>>>>>>>>>>> routing algorithms. Did you try updn or dnup ?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The key is determining whether the IPoIB broadcast group is
>>>>>>>>>>>>>> setup correctly. What MLID is the group built on (usually 0xc000) ? What
>>>>>>>>>>>>>> are the group parameters (rate, MTU) ? Are all members that are running
>>>>>>>>>>>>>> IPoIB joined ? Is the group routed to all such members ? There are
>>>>>>>>>>>>>> infiniband-diags for all of this.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Oct 28, 2013 at 12:19 PM, Robert LeBlanc <
>>>>>>>>>>>>>> robert_leblanc at byu.edu> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> OpenSM (the SM runs on Xsigo so they manage it) is using
>>>>>>>>>>>>>>> minhop. I've loaded the ibnetdiscover output into ibsim and run all the
>>>>>>>>>>>>>>> different routing algorithms against it with and without scatter ports.
>>>>>>>>>>>>>>> Minhop had 50% of our hosts running all paths through a single IS5030
>>>>>>>>>>>>>>> switch (at least the LIDs we need which represent Ethernet and Fibre
>>>>>>>>>>>>>>> Channel cards the hosts should communicate with). Ftree, dor, and dfsssp
>>>>>>>>>>>>>>> failed back to minhop, the others routed more paths through the same IS5030
>>>>>>>>>>>>>>> in some cases increasing our host count with single point of failure to
>>>>>>>>>>>>>>> 75%.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>  As far as I can tell there is no partitions.conf file so I
>>>>>>>>>>>>>>> assume we are using the default partition. There is an opensm.opts file,
>>>>>>>>>>>>>>> but it only specifies logging information.
>>>>>>>>>>>>>>>  # SA database file name
>>>>>>>>>>>>>>> sa_db_file /var/log/opensm-sa.dump
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>  # If TRUE causes OpenSM to dump SA database at the end of
>>>>>>>>>>>>>>> # every light sweep, regardless of the verbosity level
>>>>>>>>>>>>>>> sa_db_dump TRUE
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>  # The directory to hold the file OpenSM dumps
>>>>>>>>>>>>>>> dump_files_dir /var/log/
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>  The SM node is:
>>>>>>>>>>>>>>>  xsigoa:/opt/xsigo/xsigos/current/ofed/etc# ibaddr
>>>>>>>>>>>>>>> GID fe80::13:9702:100:979 LID start 0x1 end 0x1
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>  We do have Switch-X in two of the Dell m1000e chassis but
>>>>>>>>>>>>>>> the cards, ports 17-32, are FDR10 (the switch may be straight FDR, but I'm
>>>>>>>>>>>>>>> not 100% sure). The IS5030 are QDR which the Switch-X are connected to, the
>>>>>>>>>>>>>>> switches in the Xsigo directors are QDR, but the Ethernet and Fibre Channel
>>>>>>>>>>>>>>> cards are DDR. The DDR cards will not be running IPoIB (at least to my
>>>>>>>>>>>>>>> knowledge they don't have the ability), only the hosts should be leveraging
>>>>>>>>>>>>>>> IPoIB. I hope that clears up some of your questions. If you have more, I
>>>>>>>>>>>>>>> will try to answer them.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>  Robert LeBlanc
>>>>>>>>>>>>>>> OIT Infrastructure & Virtualization Engineer
>>>>>>>>>>>>>>> Brigham Young University
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>  On Mon, Oct 28, 2013 at 9:57 AM, Hal Rosenstock <
>>>>>>>>>>>>>>> hal.rosenstock at gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>  What routing algorithm is configured in OpenSM ? What
>>>>>>>>>>>>>>>> does your partitions.conf file look like ? Which node is your OpenSM ?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Also, I only see QDR and DDR links although you have
>>>>>>>>>>>>>>>> Switch-X so I assume all FDR ports are connected to slower (QDR) devices. I
>>>>>>>>>>>>>>>> don't see any FDR-10 ports but maybe they're also connected to QDR ports so
>>>>>>>>>>>>>>>> show up as QDR in the topology.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> There are DDR CAs in Xsigo box but not sure whether or not
>>>>>>>>>>>>>>>> they run IPoIB.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -- Hal
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>  On Sun, Oct 27, 2013 at 9:46 PM, Robert LeBlanc <
>>>>>>>>>>>>>>>> robert_leblanc at byu.edu> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  Since you guys are amazingly helpful, I thought I would
>>>>>>>>>>>>>>>>> pick your brains in a new problem.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  We have two Xsigo directors cross connected to four
>>>>>>>>>>>>>>>>> Mellanox IS5030 switches. Connected to those we have four Dell m1000e
>>>>>>>>>>>>>>>>> chassis each with two IB switches (two chassis have QDR and two have
>>>>>>>>>>>>>>>>> FDR10). We have 9 dual-port rack servers connected to the IS5030 switches.
>>>>>>>>>>>>>>>>> For testing purposes we have an additional Dell m1000e QDR chassis
>>>>>>>>>>>>>>>>> connected to one Xsigo director and two dual-port FDR10 rack servers
>>>>>>>>>>>>>>>>> connected to the other Xsigo director.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  I can get IPoIB to work between the two test rack
>>>>>>>>>>>>>>>>> servers connected to the one Xsigo director. But I can not get IPoIB to
>>>>>>>>>>>>>>>>> work between any blades either right next to each other or to the working
>>>>>>>>>>>>>>>>> rack servers. I'm using the same exact live CentOS ISO on all four servers.
>>>>>>>>>>>>>>>>> I've checked opensm and the blades have joined the multicast group 0xc000
>>>>>>>>>>>>>>>>> properly. tcpdump basically says that traffic is not leaving the blades.
>>>>>>>>>>>>>>>>> tcpdump also shows no traffic entering the blades from the rack servers. An
>>>>>>>>>>>>>>>>> ibtracert using 0xc000 mlid shows that routing exists between hosts.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  I've read about MulticastFDBTop=0xBFFF but I don't know
>>>>>>>>>>>>>>>>> how to set it and I doubt it would have been set by default.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  Anyone have some ideas on troubleshooting steps to try?
>>>>>>>>>>>>>>>>> I think Google is tired of me asking questions about it.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  Thanks,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  Robert LeBlanc
>>>>>>>>>>>>>>>>> OIT Infrastructure & Virtualization Engineer
>>>>>>>>>>>>>>>>> Brigham Young University
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  _______________________________________________
>>>>>>>>>>>>>>>>> Users mailing list
>>>>>>>>>>>>>>>>> Users at lists.openfabrics.org
>>>>>>>>>>>>>>>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/users
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>  _______________________________________________
>>> Users mailing list
>>> Users at lists.openfabrics.org
>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/users
>>>
>>>
>>>  ====================================
>>>
>>>  Susan Coulter
>>> HPC-3 Network/Infrastructure
>>> 505-667-8425
>>> Increase the Peace...
>>> An eye for an eye leaves the whole world blind
>>> ====================================
>>>
>>>
>>
>>  ====================================
>>
>>  Susan Coulter
>> HPC-3 Network/Infrastructure
>> 505-667-8425
>> Increase the Peace...
>> An eye for an eye leaves the whole world blind
>> ====================================
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/users/attachments/20131030/9f66370e/attachment.html>


More information about the Users mailing list