[Users] IPoIB on CentOS 6.5
Trey Dockendorf
treydock at tamu.edu
Wed Jul 16 10:31:26 PDT 2014
First time posting, so apologies if this is the wrong list for such a question.
I'm in the process of upgrading our cluster from CentOS 5.7 to CentOS 6.5. I have just discovered that the IPoIB functionality is not working on CentOS 6.5. I'm using the Infiniband Support packages provided in the stock repositories. Currently the systems I'm working with are either using Mellanox ConnectX cards (2U Twins) or Mellanox ConnectX3 cards. A handful of our storage nodes have ConnectX2 cards with a QSFP+ and 10GbE port. Our switch is a Voltaire ISR 2012 (EOL) and that also runs our subnet manager. Right now what I'm seeing is inability to ping the IP from one host to another. A host can ping its own IB interface IP , but not others. What's odd is in some cases I am able to ping specific hosts but not others.
Below is output from a system that's working and one that's not. The system that is not working I tried the latest kernel from our internal mirror of CentOS (2.6.32-431.20.3.el6.x86_64) as well as the same kernel running on working host (2.6.32-431.17.1.el6.x86_64). ibping works just fine between all systems.
I have installed and started opensm on a separate host but the Voltaire is still acting as master and I am hesitant to disable it as master as we still have 90% of our cluster running EL5 and being used by our users.
Any help is greatly appreciated. This is the first issue I've had with our Infiniband setup, but it's also the first attempt at an upgrade since our switch was purchased many years ago (before my time here).
Thanks,
- Trey
Example of working host:
# ping -I ib0 -c 3 -q vmstore1-ib
PING vmstore1-ib.brazos.tamu.edu (192.168.211.245) from 192.168.211.103 ib0: 56(84) bytes of data.
--- vmstore1-ib.brazos.tamu.edu ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.103/0.105/0.109/0.008 ms
# arp -i ib0 -a
vmstore1-ib.brazos.tamu.edu (192.168.211.245) at 80:00:00:48:fe:80:00:00:00 [infiniband] on ib0
# ibstat
CA 'mlx4_0'
CA type: MT25418
Number of ports: 1
Firmware version: 2.7.200
Hardware version: a0
Node GUID: 0x003048d786ba0000
System image GUID: 0x003048d786ba0003
Port 1:
State: Active
Physical state: LinkUp
Rate: 20
Base lid: 215
LMC: 0
SM lid: 1
Capability mask: 0x02510868
Port GUID: 0x003048d786ba0001
Link layer: InfiniBand
# ip link show ib0
8: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520 qdisc pfifo_fast state UP qlen 256
link/infiniband 80:00:00:48:fe:80:00:00:00:00:00:00:00:30:48:d7:86:ba:00:01 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
Example of broken host:
# ping -I ib0 -c 3 -q vmstore1-ib
PING vmstore1-ib.brazos.tamu.edu (192.168.211.245) from 192.168.211.104 ib0: 56(84) bytes of data.
--- vmstore1-ib.brazos.tamu.edu ping statistics ---
3 packets transmitted, 0 received, +3 errors, 100% packet loss, time 3001ms
pipe 3
# arp -i ib0 -a
vmstore1-ib.brazos.tamu.edu (192.168.211.245) at <incomplete> on ib0
# ibstat
CA 'mlx4_0'
CA type: MT4099
Number of ports: 1
Firmware version: 2.30.3200
Hardware version: 0
Node GUID: 0xf452140300274c10
System image GUID: 0xf452140300274c13
Port 1:
State: Active
Physical state: LinkUp
Rate: 20
Base lid: 36
LMC: 0
SM lid: 1
Capability mask: 0x02514868
Port GUID: 0xf452140300274c11
Link layer: InfiniBand
# ip link show ib0
4: ib0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 65520 qdisc pfifo_fast state DOWN qlen 256
link/infiniband 80:00:00:48:fe:80:00:00:00:00:00:00:f4:52:14:03:00:27:4c:11 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
=============================
Trey Dockendorf
Systems Analyst I
Texas A&M University
Academy for Advanced Telecommunications and Learning Technologies
Phone: (979)458-2396
Email: treydock at tamu.edu
Jabber: treydock at tamu.edu
More information about the Users
mailing list