[Users] High PortXmitWait on a lot of ports - degraded performance

Hal Rosenstock hal.rosenstock at gmail.com
Wed Nov 25 06:54:47 PST 2015


PortXmitWait is a sign of congestion on those ports. It's the rate of
increase (and on which "tier" in subnet it is occurring) that matters. Note
that IB counters are sticky rather than rollover. It is 32 bit counter.
infiniband-diags gives raw numbers. There are some proprietary tools (e.g.
UFM) that do better analysis on this.

As to multiple OpenSMs, this should be fine. One will be elected master;
others should be in standby. Hopefully, all the SMs are identical including
configuration.

-- Hal

On Wed, Nov 25, 2015 at 7:29 AM, German Anders <ganders at despegar.com> wrote:

> Hi all,
>
> I'm having some issues with my IB network, basically I have the following
> setup (pdf attach). I run a fio test between the HP Blade with QDR (bonding
> ports active/backup mode), to a storage cluster with FDR (with no bonding
> at all), and the best result that I can get is 1.7GB/s, that's pretty slow
> actually. However I was hopping something between 2.5-3.5GB/s on a QDR
> infiniband network. Then I try to tweak some parameters, for example
> setting the scaling_governor to 'performance', and set the 'connected' mode
> in the ib ports, then change the following variables:
>
> sysctl -w net.core.netdev_max_backlog=250000
> sysctl -w net.core.rmem_max=4194304
> sysctl -w net.core.wmem_max=4194304
> sysctl -w net.core.rmem_default=4194304
> sysctl -w net.core.wmem_default=4194304
> sysctl -w net.core.optmem_max=4194304
> sysctl -w net.ipv4.tcp_rmem="4096 87380 4194304"
> sysctl -w net.ipv4.tcp_wmem="4096 65536 4194304"
> sysctl -w net.ipv4.tcp_low_latency=1
>
> The bond configuration is the following:
>
> # cat /etc/modprobe.d/bonding.conf
>
> alias bond-ib bonding options bonding mode=1 miimon=100 downdelay=100
> updelay=100 max_bonds=2
>
>
> # cat /etc/network/interfaces
>
> (...)
>
> ## INFINIBAND CONF
> auto ib0
> iface ib0 inet manual
>         bond-master bond-ib
>
> auto ib1
> iface ib1 inet manual
>         bond-master bond-ib
>
> auto bond-ib
> iface bond-ib inet static
>     address 172.23.18.1
>     netmask 255.255.240.0
>     slaves ib0 ib1
>     bond_miimon 100
>     bond_mode active-backup
>     pre-up echo connected > /sys/class/net/ib0/mode
>     pre-up echo connected > /sys/class/net/ib1/mode
>     pre-up /sbin/ifconfig ib0 mtu 65520
>     pre-up /sbin/ifconfig ib1 mtu 65520
>     pre-up modprobe bond-ib
>     pre-up /sbin/ifconfig bond-ib mtu 65520
>
>
> OS is Ubuntu 14.04.3 LTS on the HP blade with Kernel 3.13.0-63-generic,
> and Ubuntu 14.04.3 LTS with kernel 3.19.0-25-generic for the storage
> cluster.
>
> The IB Mezzanine cards on the HP Blades are "InfiniBand: Mellanox
> Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE] (rev
> b0)". And on the storage cluster the IB ADPT are "Network controller:
> Mellanox Technologies MT27500 Family [ConnectX-3]
>
> Then I run in one of the nodes in cluster the '*ibqueryerrors*' command
> and found the following:
>
> $ ibqueryerrors
> Errors for "e60-host01 HCA-1"  ---> blade1 the one with the bonding
> configuration using internally HP-IB-SW port 17 and 25
>    GUID 0xf452140300dd3296 port 2: [PortXmitWait == 15]
> Errors for 0x2c902004b0918 "Infiniscale-IV Mellanox Technologies"
>    GUID 0x2c902004b0918 port ALL: [PortXmitWait == 325727936]
>    GUID 0x2c902004b0918 port 25: [PortXmitWait == 325727936]
> Errors for 0xe41d2d030031e9c1 "MF0;GWIB01:SX6036G/U1"
>    GUID 0xe41d2d030031e9c1 port ALL: [PortXmitWait == 326981305]
>    GUID 0xe41d2d030031e9c1 port 11: [PortXmitWait == 326976642]
>    GUID 0xe41d2d030031e9c1 port 36: [PortXmitWait == 4663]
> Errors for 0xf45214030073f500 "MF0;SWIB02:SX6018/U1"
>    GUID 0xf45214030073f500 port ALL: [PortXmitWait == 13979524]
>    GUID 0xf45214030073f500 port 8: [PortXmitWait == 3749467]
>    GUID 0xf45214030073f500 port 9: [PortXmitWait == 3434343]
>    GUID 0xf45214030073f500 port 10: [PortXmitWait == 3389114]
>    GUID 0xf45214030073f500 port 11: [PortXmitWait == 3406600]
> Errors for 0xe41d2d030031eb41 "MF0;GWIB02:SX6036G/U1"
>    GUID 0xe41d2d030031eb41 port ALL: [PortXmitWait == 1352]
>    GUID 0xe41d2d030031eb41 port 34: [PortXmitWait == 1352]
> Errors for "cibn08 HCA-1"
>    GUID 0xe41d2d03007b77c1 port 1: [PortXmitWait == 813152781]
>    GUID 0xe41d2d03007b77c2 port 2: [PortXmitWait == 3256286]
> Errors for "cibn07 HCA-1"
>    GUID 0xe41d2d03007b67c1 port 1: [PortXmitWait == 841850209]
>    GUID 0xe41d2d03007b67c2 port 2: [PortXmitWait == 3211488]
> Errors for "cibn05 HCA-1"
>    GUID 0xe41d2d0300d95191 port 1: [PortXmitWait == 840576923]
>    GUID 0xe41d2d0300d95192 port 2: [PortXmitWait == 2635901]
> Errors for "cibn06 HCA-1"
>    GUID 0xe41d2d03007b77b1 port 1: [PortXmitWait == 843231930]
>    GUID 0xe41d2d03007b77b2 port 2: [PortXmitWait == 2869022]
> Errors for 0xe41d2d0300097630 "MF0;SWIB01:SX6018/U1"
>    GUID 0xe41d2d0300097630 port ALL: [PortXmitWait == 470746689]
>    GUID 0xe41d2d0300097630 port 0: [PortXmitWait == 7]
>    GUID 0xe41d2d0300097630 port 2: [PortXmitWait == 8046]
>    GUID 0xe41d2d0300097630 port 3: [PortXmitWait == 7631]
>    GUID 0xe41d2d0300097630 port 8: [PortXmitWait == 219608]
>    GUID 0xe41d2d0300097630 port 9: [PortXmitWait == 216118]
>    GUID 0xe41d2d0300097630 port 10: [PortXmitWait == 198693]
>    GUID 0xe41d2d0300097630 port 11: [PortXmitWait == 206192]
>    GUID 0xe41d2d0300097630 port 18: [PortXmitWait == 469890394]
> Errors for "cibm01 HCA-1"
>    GUID 0xe41d2d0300163651 port 1: [PortXmitWait == 6002]
>
> ## Summary: 22 nodes checked, 11 bad nodes found
> ##          208 ports checked, 26 ports have errors beyond threshold
> ##
> ## Suppressed:
>
>
> $ ibportstate -L 29 17 query
> Switch PortInfo:
> # Port info: Lid 29 port 17
> LinkState:.......................Active
> PhysLinkState:...................LinkUp
> Lid:.............................75
> SMLid:...........................2328
> LMC:.............................0
> LinkWidthSupported:..............1X or 4X
> LinkWidthEnabled:................1X or 4X
> LinkWidthActive:.................4X
> LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps or 10.0 Gbps
> LinkSpeedEnabled:................2.5 Gbps or 5.0 Gbps or 10.0 Gbps
> LinkSpeedActive:.................10.0 Gbps
> Peer PortInfo:
> # Port info: Lid 29 DR path slid 4; dlid 65535; 0,17 port 1
> LinkState:.......................Active
> PhysLinkState:...................LinkUp
> Lid:.............................32
> SMLid:...........................2
> LMC:.............................0
> LinkWidthSupported:..............1X or 4X
> LinkWidthEnabled:................1X or 4X
> LinkWidthActive:.................4X
> LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps or 10.0 Gbps
> LinkSpeedEnabled:................2.5 Gbps or 5.0 Gbps or 10.0 Gbps
> LinkSpeedActive:.................10.0 Gbps
> Mkey:............................<not displayed>
> MkeyLeasePeriod:.................0
> ProtectBits:.....................0
>
> ---
>
> $ ibportstate -L 29 25 query
> Switch PortInfo:
> # Port info: Lid 29 port 25
> LinkState:.......................Active
> PhysLinkState:...................LinkUp
> Lid:.............................75
> SMLid:...........................2328
> LMC:.............................0
> LinkWidthSupported:..............1X or 4X
> LinkWidthEnabled:................1X or 4X
> LinkWidthActive:.................4X
> LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps or 10.0 Gbps
> LinkSpeedEnabled:................2.5 Gbps or 5.0 Gbps or 10.0 Gbps
> LinkSpeedActive:.................10.0 Gbps
> Peer PortInfo:
> # Port info: Lid 29 DR path slid 4; dlid 65535; 0,25 port 2
> LinkState:.......................Active
> PhysLinkState:...................LinkUp
> Lid:.............................33
> SMLid:...........................2
> LMC:.............................0
> LinkWidthSupported:..............1X or 4X
> LinkWidthEnabled:................1X or 4X
> LinkWidthActive:.................4X
> LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps or 10.0 Gbps
> LinkSpeedEnabled:................2.5 Gbps or 5.0 Gbps or 10.0 Gbps
> LinkSpeedActive:.................10.0 Gbps
> Mkey:............................<not displayed>
> MkeyLeasePeriod:.................0
> ProtectBits:.....................0
>
>
> First I thought that maybe some cables could be in a bad state, but.. all
> of them?... so I really don't know if maybe this XmitWait could be pushing
> some noise on the performance at all or not. Any ideas? or hints? Also I
> had the SM configured on SWIB01 with high priority and then a second SM
> configured on SWIB02 with less priority, both in an active state, is this
> ok? Or is better to only have one and only one SM active at a time in the
> entire IB network?
>
> Also find below some iperf tests between blades that are on different
> enclosures:
>
> *e61-host01 (server):*
>
> # iperf -s
>
> *e60-host01 (client):*
>
> # iperf -c 172.23.18.10 -P 4
>
> ------------------------------------------------------------
> Client connecting to 172.23.18.10, TCP port 5001
> TCP window size: 2.50 MByte (default)
> ------------------------------------------------------------
> [  3] local 172.23.18.1 port 52325 connected with 172.23.18.10 port 5001
> [  4] local 172.23.18.1 port 52326 connected with 172.23.18.10 port 5001
> [  5] local 172.23.18.1 port 52327 connected with 172.23.18.10 port 5001
> [  6] local 172.23.18.1 port 52328 connected with 172.23.18.10 port 5001
> [ ID] Interval       Transfer     Bandwidth
> [  4]  0.0-10.0 sec  3.55 GBytes  3.05 Gbits/sec
> [  6]  0.0-10.0 sec  3.02 GBytes  2.60 Gbits/sec
> [  3]  0.0-10.0 sec  2.91 GBytes  2.50 Gbits/sec
> [  5]  0.0-10.0 sec  2.75 GBytes  2.36 Gbits/sec
> [SUM]  0.0-10.0 sec  12.2 GBytes  10.5 Gbits/sec
>
> ---
>
> Now, between a storage cluster node and a blade:
>
> *e60-host01 (server):*
>
> # iperf -s
>
> *cibn05 (client):*
>
> # iperf -c 172.23.18.1 -P 4
>
> ------------------------------------------------------------
> Client connecting to 172.23.18.1, TCP port 5001
> TCP window size: 2.50 MByte (default)
> ------------------------------------------------------------
> [  6] local 172.23.17.5 port 34263 connected with 172.23.18.1 port 5001
> [  4] local 172.23.17.5 port 34260 connected with 172.23.18.1 port 5001
> [  5] local 172.23.17.5 port 34262 connected with 172.23.18.1 port 5001
> [  3] local 172.23.17.5 port 34261 connected with 172.23.18.1 port 5001
> [ ID] Interval       Transfer     Bandwidth
> [  4]  0.0- 9.0 sec  3.80 GBytes  3.63 Gbits/sec
> [  5]  0.0- 9.0 sec  3.78 GBytes  3.60 Gbits/sec
> [  3]  0.0- 9.0 sec  3.78 GBytes  3.61 Gbits/sec
> [  6]  0.0-10.0 sec  5.26 GBytes  4.52 Gbits/sec
> [SUM]  0.0-10.0 sec  16.6 GBytes  14.3 Gbits/sec
>
>
> Thanks in advance,
>
> Best,
>
>
> *German*
>
> _______________________________________________
> Users mailing list
> Users at lists.openfabrics.org
> http://lists.openfabrics.org/mailman/listinfo/users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/users/attachments/20151125/08ba93f3/attachment.html>


More information about the Users mailing list