[Users] PortXmitWait?

Mehdi Denou mehdi.denou at bull.net
Fri Mar 14 15:12:32 PDT 2014


Hi Florent,

if you have a FT topology, 10 cables can have a VERY big impact on the
fabric overall performance (the fat tree is not very fault tolerant).


On 14/03/2014 16:17, Florent Parent wrote:
>
> Hi Jeff,
>
> I'm collecting data to do analysis over time, and indeed there are no
> XmitDiscards. I will add the XmitWait counters to watch for "hot
> spots" over time.
>
> We do have links down in some CXP cables (10 ports total, all spread
> out in different cables). I will check if there is any correlation
> with the observed XmitWait counters.
>
> The PCIe gen2 bus width is x8 for the QDR chip on the blades. Gen2
> provides 4Gbps per lane, so x8 would provide 32Gbps, which matches the
> QDR data rate. Or is my math wrong?
>
> Thanks for the Oracle document pointer, I guess I should have known
> about it's existence :)
>
> Thanks for the response, and good to hear from you.
> Florent
>
>
> On Fri, Mar 14, 2014 at 6:46 AM, Le Fillatre Jean-Francois
> <jf.lefillatre at univ-lille1.fr <mailto:jf.lefillatre at univ-lille1.fr>>
> wrote:
>
>
>     Hello Florent,
>
>     I'll go for the congestion explanation too. It's not a major issue
>     as there seems to be no discarded packets from the few lines of
>     data included in the original email. The number of PortXmitWaits
>     is orders of magnitude below the number of PortXmitDatas, and the
>     numbers don't seem to be directly correlated. So it looks like
>     they're not regular events with a clear pattern.
>
>     Oracle's Infiniband network troubleshooting guide says:
>     PortXmitWait : The number of ticks during which the port selected
>     by PortSelect had data to transmit but no data was sent during the
>     entire tick either because of insufficient credits or because of
>     lack of arbitration.
>
>     The important part of that sentence is "insufficient credit". The
>     three reasons given by Hal will indeed all cause insufficient
>     credit at some point of the link between two nodes. Any of those
>     may cause the sendind HBA to pause and wait until there is
>     available credit all the way, therefore increasing the
>     PortXmitWait count.
>
>     From what I remember of your site:
>     - do you still have some links down on some of your cables?
>     - some users do indeed do many-to-one MPI communications
>     - I can't remember if the QDR chip has an x8 or x16 PCIe
>     connection with the board. If it's x8 then the IB chip will be
>     able to saturate the PCIe bus, thus limiting the IB rate at peak
>     use times.
>
>     Full Oracle document there:
>      http://www.oracle.com/technetwork/database/availability/infiniband-network-troubleshooting-1863251.pdf
>
>     Thanks,
>     JF
>
>
>
>     On Thursday, March 13, 2014 12:36 CET, Hal Rosenstock
>     <hal.rosenstock at gmail.com <mailto:hal.rosenstock at gmail.com>> wrote:
>
>     > Some causes of congestion are: slow receiver, many to one
>     communication,
>     > and "poor" fat tree topology.
>     >
>     > On the last item, are all links in the subnet same speed and
>     width ? How
>     > many links are used going up the fat tree to the next rank ?
>     >
>     > Are all end nodes connected to rank 2 or are any connected to
>     higher rank ?
>     >
>     > Are there any "combined" nodes ? By this I mean, some device
>     which is more
>     > than just single switch or CA. If so, what are they and where do
>     they live
>     > in the topology ?
>     >
>     >
>     > On Wed, Mar 12, 2014 at 11:50 PM, Hal Rosenstock
>     > <hal.rosenstock at gmail.com <mailto:hal.rosenstock at gmail.com>>wrote:
>     >
>     > > By the fact that you didn't mention PortXmitDiscards, does it
>     mean that
>     > > these are 0 ? Assuming so, PortXmitWait is indicating there is
>     some
>     > > congestion but it has not risen to the level of dropping
>     packets. It's the
>     > > rate of increase of the XmitWait counter that's important
>     rather than the
>     > > absolute number so if you want to chase this, the focus should
>     be on the
>     > > ports most congested.
>     > >
>     > > Since the old tool didn't report XmitWait counters, it's hard
>     to know
>     > > whether this is the same as before or not unless you did this
>     manually.
>     > >
>     > > Was the routing previously fat tree ? Are there any other fat
>     tree related
>     > > log messages in the OpenSM log ? Is there any fat tree
>     configuration of
>     > > compute and/or I/O nodes ?
>     > >
>     > > Any idea on what is the traffic pattern ? Are you running MPI ?
>     > >
>     > > -- Hal
>     > >
>     > >
>     > > On Wed, Mar 12, 2014 at 8:17 PM, Florent Parent <
>     > > florent.parent at calculquebec.ca
>     <mailto:florent.parent at calculquebec.ca>> wrote:
>     > >
>     > >>
>     > >> Hello IB users,
>     > >>
>     > >> We recently migrated our opensm from 3.2.6 to 3.3.17. In this
>     upgrade, we
>     > >> moved to CentOS6.5 with the stock RDMA and
>     infiniband-diags_1.5.12-5., and
>     > >> running opensm 3.3.17. Routing is FatTree:
>     > >> General fabric topology info
>     > >> ============================
>     > >> - FatTree rank (roots to leaf switches): 3
>     > >> - FatTree max switch rank: 2
>     > >> - Fabric has 966 CAs, 966 CA ports (603 of them CNs), 186
>     switches
>     > >> - Fabric has 36 switches at rank 0 (roots)
>     > >> - Fabric has 64 switches at rank 1
>     > >> - Fabric has 86 switches at rank 2 (86 of them leafs)
>     > >>
>     > >> Now to the question: ibqueryerrors 1.5.12 is reporting high
>     PortXmitWait
>     > >> values throughout the fabric. We did not see this counter
>     before (it was
>     > >> not reported by the older ibqueryerrors.pl
>     <http://ibqueryerrors.pl>)
>     > >>
>     > >> To give an idea of the scale of the counters, here's a capture of
>     > >> ibqueryerrors --data on one specific I4 switch, 10 seconds
>     after clearing
>     > >> the counters (-k -K):
>     > >>
>     > >> GUID 0x21283a83b30050 port 4:  PortXmitWait == 2932676
>      PortXmitData ==
>     > >> 90419517 (344.923MB)  PortRcvData == 1526963011 (5.688GB)
>     > >> GUID 0x21283a83b30050 port 5:  PortXmitWait == 3110105
>      PortXmitData ==
>     > >> 509580912 (1.898GB)  PortRcvData == 13622 (53.211KB)
>     > >> GUID 0x21283a83b30050 port 6:  PortXmitWait == 8696397
>      PortXmitData ==
>     > >> 480870802 (1.791GB)  PortRcvData == 17067 (66.668KB)
>     > >> GUID 0x21283a83b30050 port 7:  PortXmitWait == 1129568
>      PortXmitData ==
>     > >> 126483825 (482.497MB)  PortRcvData == 24973385 (95.266MB)
>     > >> GUID 0x21283a83b30050 port 8:  PortXmitWait == 29021
>      PortXmitData ==
>     > >> 19444902 (74.176MB)  PortRcvData == 84447725 (322.143MB)
>     > >> GUID 0x21283a83b30050 port 9:  PortXmitWait == 4945130
>      PortXmitData ==
>     > >> 161911244 (617.642MB)  PortRcvData == 27161 (106.098KB)
>     > >> GUID 0x21283a83b30050 port 10:  PortXmitWait == 16795
>      PortXmitData ==
>     > >> 35572510 (135.698MB)  PortRcvData == 681174731 (2.538GB)
>     > >> ... (this goes on for every active ports)
>     > >>
>     > >> We are not observing any failures, so I suspect that I need
>     help to
>     > >> interpret these numbers. Do I need to be worried?
>     > >>
>     > >> Cheers,
>     > >> Florent
>     > >>
>     > >>
>     > >> _______________________________________________
>     > >> Users mailing list
>     > >> Users at lists.openfabrics.org <mailto:Users at lists.openfabrics.org>
>     > >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/users
>     > >>
>     > >>
>     > >
>
>
>
>
>
>
> _______________________________________________
> Users mailing list
> Users at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/users

-- 
---
Mehdi Denou


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/users/attachments/20140314/11e59f95/attachment.html>


More information about the Users mailing list