[Users] High PortXmitWait on a lot of ports - degraded performance
German Anders
ganders at despegar.com
Wed Nov 25 04:29:51 PST 2015
Hi all,
I'm having some issues with my IB network, basically I have the following
setup (pdf attach). I run a fio test between the HP Blade with QDR (bonding
ports active/backup mode), to a storage cluster with FDR (with no bonding
at all), and the best result that I can get is 1.7GB/s, that's pretty slow
actually. However I was hopping something between 2.5-3.5GB/s on a QDR
infiniband network. Then I try to tweak some parameters, for example
setting the scaling_governor to 'performance', and set the 'connected' mode
in the ib ports, then change the following variables:
sysctl -w net.core.netdev_max_backlog=250000
sysctl -w net.core.rmem_max=4194304
sysctl -w net.core.wmem_max=4194304
sysctl -w net.core.rmem_default=4194304
sysctl -w net.core.wmem_default=4194304
sysctl -w net.core.optmem_max=4194304
sysctl -w net.ipv4.tcp_rmem="4096 87380 4194304"
sysctl -w net.ipv4.tcp_wmem="4096 65536 4194304"
sysctl -w net.ipv4.tcp_low_latency=1
The bond configuration is the following:
# cat /etc/modprobe.d/bonding.conf
alias bond-ib bonding options bonding mode=1 miimon=100 downdelay=100
updelay=100 max_bonds=2
# cat /etc/network/interfaces
(...)
## INFINIBAND CONF
auto ib0
iface ib0 inet manual
bond-master bond-ib
auto ib1
iface ib1 inet manual
bond-master bond-ib
auto bond-ib
iface bond-ib inet static
address 172.23.18.1
netmask 255.255.240.0
slaves ib0 ib1
bond_miimon 100
bond_mode active-backup
pre-up echo connected > /sys/class/net/ib0/mode
pre-up echo connected > /sys/class/net/ib1/mode
pre-up /sbin/ifconfig ib0 mtu 65520
pre-up /sbin/ifconfig ib1 mtu 65520
pre-up modprobe bond-ib
pre-up /sbin/ifconfig bond-ib mtu 65520
OS is Ubuntu 14.04.3 LTS on the HP blade with Kernel 3.13.0-63-generic, and
Ubuntu 14.04.3 LTS with kernel 3.19.0-25-generic for the storage cluster.
The IB Mezzanine cards on the HP Blades are "InfiniBand: Mellanox
Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE] (rev
b0)". And on the storage cluster the IB ADPT are "Network controller:
Mellanox Technologies MT27500 Family [ConnectX-3]
Then I run in one of the nodes in cluster the '*ibqueryerrors*' command and
found the following:
$ ibqueryerrors
Errors for "e60-host01 HCA-1" ---> blade1 the one with the bonding
configuration using internally HP-IB-SW port 17 and 25
GUID 0xf452140300dd3296 port 2: [PortXmitWait == 15]
Errors for 0x2c902004b0918 "Infiniscale-IV Mellanox Technologies"
GUID 0x2c902004b0918 port ALL: [PortXmitWait == 325727936]
GUID 0x2c902004b0918 port 25: [PortXmitWait == 325727936]
Errors for 0xe41d2d030031e9c1 "MF0;GWIB01:SX6036G/U1"
GUID 0xe41d2d030031e9c1 port ALL: [PortXmitWait == 326981305]
GUID 0xe41d2d030031e9c1 port 11: [PortXmitWait == 326976642]
GUID 0xe41d2d030031e9c1 port 36: [PortXmitWait == 4663]
Errors for 0xf45214030073f500 "MF0;SWIB02:SX6018/U1"
GUID 0xf45214030073f500 port ALL: [PortXmitWait == 13979524]
GUID 0xf45214030073f500 port 8: [PortXmitWait == 3749467]
GUID 0xf45214030073f500 port 9: [PortXmitWait == 3434343]
GUID 0xf45214030073f500 port 10: [PortXmitWait == 3389114]
GUID 0xf45214030073f500 port 11: [PortXmitWait == 3406600]
Errors for 0xe41d2d030031eb41 "MF0;GWIB02:SX6036G/U1"
GUID 0xe41d2d030031eb41 port ALL: [PortXmitWait == 1352]
GUID 0xe41d2d030031eb41 port 34: [PortXmitWait == 1352]
Errors for "cibn08 HCA-1"
GUID 0xe41d2d03007b77c1 port 1: [PortXmitWait == 813152781]
GUID 0xe41d2d03007b77c2 port 2: [PortXmitWait == 3256286]
Errors for "cibn07 HCA-1"
GUID 0xe41d2d03007b67c1 port 1: [PortXmitWait == 841850209]
GUID 0xe41d2d03007b67c2 port 2: [PortXmitWait == 3211488]
Errors for "cibn05 HCA-1"
GUID 0xe41d2d0300d95191 port 1: [PortXmitWait == 840576923]
GUID 0xe41d2d0300d95192 port 2: [PortXmitWait == 2635901]
Errors for "cibn06 HCA-1"
GUID 0xe41d2d03007b77b1 port 1: [PortXmitWait == 843231930]
GUID 0xe41d2d03007b77b2 port 2: [PortXmitWait == 2869022]
Errors for 0xe41d2d0300097630 "MF0;SWIB01:SX6018/U1"
GUID 0xe41d2d0300097630 port ALL: [PortXmitWait == 470746689]
GUID 0xe41d2d0300097630 port 0: [PortXmitWait == 7]
GUID 0xe41d2d0300097630 port 2: [PortXmitWait == 8046]
GUID 0xe41d2d0300097630 port 3: [PortXmitWait == 7631]
GUID 0xe41d2d0300097630 port 8: [PortXmitWait == 219608]
GUID 0xe41d2d0300097630 port 9: [PortXmitWait == 216118]
GUID 0xe41d2d0300097630 port 10: [PortXmitWait == 198693]
GUID 0xe41d2d0300097630 port 11: [PortXmitWait == 206192]
GUID 0xe41d2d0300097630 port 18: [PortXmitWait == 469890394]
Errors for "cibm01 HCA-1"
GUID 0xe41d2d0300163651 port 1: [PortXmitWait == 6002]
## Summary: 22 nodes checked, 11 bad nodes found
## 208 ports checked, 26 ports have errors beyond threshold
##
## Suppressed:
$ ibportstate -L 29 17 query
Switch PortInfo:
# Port info: Lid 29 port 17
LinkState:.......................Active
PhysLinkState:...................LinkUp
Lid:.............................75
SMLid:...........................2328
LMC:.............................0
LinkWidthSupported:..............1X or 4X
LinkWidthEnabled:................1X or 4X
LinkWidthActive:.................4X
LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps or 10.0 Gbps
LinkSpeedEnabled:................2.5 Gbps or 5.0 Gbps or 10.0 Gbps
LinkSpeedActive:.................10.0 Gbps
Peer PortInfo:
# Port info: Lid 29 DR path slid 4; dlid 65535; 0,17 port 1
LinkState:.......................Active
PhysLinkState:...................LinkUp
Lid:.............................32
SMLid:...........................2
LMC:.............................0
LinkWidthSupported:..............1X or 4X
LinkWidthEnabled:................1X or 4X
LinkWidthActive:.................4X
LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps or 10.0 Gbps
LinkSpeedEnabled:................2.5 Gbps or 5.0 Gbps or 10.0 Gbps
LinkSpeedActive:.................10.0 Gbps
Mkey:............................<not displayed>
MkeyLeasePeriod:.................0
ProtectBits:.....................0
---
$ ibportstate -L 29 25 query
Switch PortInfo:
# Port info: Lid 29 port 25
LinkState:.......................Active
PhysLinkState:...................LinkUp
Lid:.............................75
SMLid:...........................2328
LMC:.............................0
LinkWidthSupported:..............1X or 4X
LinkWidthEnabled:................1X or 4X
LinkWidthActive:.................4X
LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps or 10.0 Gbps
LinkSpeedEnabled:................2.5 Gbps or 5.0 Gbps or 10.0 Gbps
LinkSpeedActive:.................10.0 Gbps
Peer PortInfo:
# Port info: Lid 29 DR path slid 4; dlid 65535; 0,25 port 2
LinkState:.......................Active
PhysLinkState:...................LinkUp
Lid:.............................33
SMLid:...........................2
LMC:.............................0
LinkWidthSupported:..............1X or 4X
LinkWidthEnabled:................1X or 4X
LinkWidthActive:.................4X
LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps or 10.0 Gbps
LinkSpeedEnabled:................2.5 Gbps or 5.0 Gbps or 10.0 Gbps
LinkSpeedActive:.................10.0 Gbps
Mkey:............................<not displayed>
MkeyLeasePeriod:.................0
ProtectBits:.....................0
First I thought that maybe some cables could be in a bad state, but.. all
of them?... so I really don't know if maybe this XmitWait could be pushing
some noise on the performance at all or not. Any ideas? or hints? Also I
had the SM configured on SWIB01 with high priority and then a second SM
configured on SWIB02 with less priority, both in an active state, is this
ok? Or is better to only have one and only one SM active at a time in the
entire IB network?
Also find below some iperf tests between blades that are on different
enclosures:
*e61-host01 (server):*
# iperf -s
*e60-host01 (client):*
# iperf -c 172.23.18.10 -P 4
------------------------------------------------------------
Client connecting to 172.23.18.10, TCP port 5001
TCP window size: 2.50 MByte (default)
------------------------------------------------------------
[ 3] local 172.23.18.1 port 52325 connected with 172.23.18.10 port 5001
[ 4] local 172.23.18.1 port 52326 connected with 172.23.18.10 port 5001
[ 5] local 172.23.18.1 port 52327 connected with 172.23.18.10 port 5001
[ 6] local 172.23.18.1 port 52328 connected with 172.23.18.10 port 5001
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.0 sec 3.55 GBytes 3.05 Gbits/sec
[ 6] 0.0-10.0 sec 3.02 GBytes 2.60 Gbits/sec
[ 3] 0.0-10.0 sec 2.91 GBytes 2.50 Gbits/sec
[ 5] 0.0-10.0 sec 2.75 GBytes 2.36 Gbits/sec
[SUM] 0.0-10.0 sec 12.2 GBytes 10.5 Gbits/sec
---
Now, between a storage cluster node and a blade:
*e60-host01 (server):*
# iperf -s
*cibn05 (client):*
# iperf -c 172.23.18.1 -P 4
------------------------------------------------------------
Client connecting to 172.23.18.1, TCP port 5001
TCP window size: 2.50 MByte (default)
------------------------------------------------------------
[ 6] local 172.23.17.5 port 34263 connected with 172.23.18.1 port 5001
[ 4] local 172.23.17.5 port 34260 connected with 172.23.18.1 port 5001
[ 5] local 172.23.17.5 port 34262 connected with 172.23.18.1 port 5001
[ 3] local 172.23.17.5 port 34261 connected with 172.23.18.1 port 5001
[ ID] Interval Transfer Bandwidth
[ 4] 0.0- 9.0 sec 3.80 GBytes 3.63 Gbits/sec
[ 5] 0.0- 9.0 sec 3.78 GBytes 3.60 Gbits/sec
[ 3] 0.0- 9.0 sec 3.78 GBytes 3.61 Gbits/sec
[ 6] 0.0-10.0 sec 5.26 GBytes 4.52 Gbits/sec
[SUM] 0.0-10.0 sec 16.6 GBytes 14.3 Gbits/sec
Thanks in advance,
Best,
*German*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/users/attachments/20151125/2b6de172/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Infiniband Diagram v1(1).pdf
Type: application/pdf
Size: 237019 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/users/attachments/20151125/2b6de172/attachment.pdf>
More information about the Users
mailing list