[Users] IPoIB on CentOS 6.5

Trey Dockendorf treydock at tamu.edu
Wed Jul 16 10:31:26 PDT 2014


First time posting, so apologies if this is the wrong list for such a question.

I'm in the process of upgrading our cluster from CentOS 5.7 to CentOS 6.5.  I have just discovered that the IPoIB functionality is not working on CentOS 6.5.  I'm using the Infiniband Support packages provided in the stock repositories.  Currently the systems I'm working with are either using Mellanox ConnectX cards (2U Twins) or Mellanox ConnectX3 cards.  A handful of our storage nodes have ConnectX2 cards with a QSFP+ and 10GbE port.  Our switch is a Voltaire ISR 2012 (EOL) and that also runs our subnet manager.  Right now what I'm seeing is inability to ping the IP from one host to another.  A host can ping its own IB interface IP , but not others.  What's odd is in some cases I am able to ping specific hosts but not others.

Below is output from a system that's working and one that's not.  The system that is not working I tried the latest kernel from our internal mirror of CentOS (2.6.32-431.20.3.el6.x86_64) as well as the same kernel running on working host (2.6.32-431.17.1.el6.x86_64).  ibping works just fine between all systems.

I have installed and started opensm on a separate host but the Voltaire is still acting as master and I am hesitant to disable it as master as we still have 90% of our cluster running EL5 and being used by our users.

Any help is greatly appreciated.  This is the first issue I've had with our Infiniband setup, but it's also the first attempt at an upgrade since our switch was purchased many years ago (before my time here).

Thanks,
- Trey

Example of working host:

# ping -I ib0 -c 3 -q vmstore1-ib
PING vmstore1-ib.brazos.tamu.edu (192.168.211.245) from 192.168.211.103 ib0: 56(84) bytes of data.

--- vmstore1-ib.brazos.tamu.edu ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.103/0.105/0.109/0.008 ms

# arp -i ib0 -a
vmstore1-ib.brazos.tamu.edu (192.168.211.245) at 80:00:00:48:fe:80:00:00:00 [infiniband] on ib0

# ibstat
CA 'mlx4_0'
        CA type: MT25418
        Number of ports: 1
        Firmware version: 2.7.200
        Hardware version: a0
        Node GUID: 0x003048d786ba0000
        System image GUID: 0x003048d786ba0003
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 20
                Base lid: 215
                LMC: 0
                SM lid: 1
                Capability mask: 0x02510868
                Port GUID: 0x003048d786ba0001
                Link layer: InfiniBand

# ip link show ib0
8: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520 qdisc pfifo_fast state UP qlen 256
    link/infiniband 80:00:00:48:fe:80:00:00:00:00:00:00:00:30:48:d7:86:ba:00:01 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff

Example of broken host:

# ping -I ib0 -c 3 -q vmstore1-ib
PING vmstore1-ib.brazos.tamu.edu (192.168.211.245) from 192.168.211.104 ib0: 56(84) bytes of data.

--- vmstore1-ib.brazos.tamu.edu ping statistics ---
3 packets transmitted, 0 received, +3 errors, 100% packet loss, time 3001ms
pipe 3

# arp -i ib0 -a
vmstore1-ib.brazos.tamu.edu (192.168.211.245) at <incomplete> on ib0

# ibstat
CA 'mlx4_0'
        CA type: MT4099
        Number of ports: 1
        Firmware version: 2.30.3200
        Hardware version: 0
        Node GUID: 0xf452140300274c10
        System image GUID: 0xf452140300274c13
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 20
                Base lid: 36
                LMC: 0
                SM lid: 1
                Capability mask: 0x02514868
                Port GUID: 0xf452140300274c11
                Link layer: InfiniBand

# ip link show ib0
4: ib0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 65520 qdisc pfifo_fast state DOWN qlen 256
    link/infiniband 80:00:00:48:fe:80:00:00:00:00:00:00:f4:52:14:03:00:27:4c:11 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff



=============================

Trey Dockendorf 
Systems Analyst I 
Texas A&M University 
Academy for Advanced Telecommunications and Learning Technologies 
Phone: (979)458-2396 
Email: treydock at tamu.edu 
Jabber: treydock at tamu.edu



More information about the Users mailing list