[Users] PortXmitWait?

Sun Mar 16 11:25:14 PDT 2014

On Friday, March 14, 2014 16:17 CET, Florent Parent <florent.parent at calculquebec.ca> wrote: 

> Hi Jeff,

Hi Florent,

> I'm collecting data to do analysis over time, and indeed there are no
> XmitDiscards. I will add the XmitWait counters to watch for "hot spots"
> over time.
> 
> We do have links down in some CXP cables (10 ports total, all spread out in
> different cables). I will check if there is any correlation with the
> observed XmitWait counters.

You might want to correlate with the jobs you're running, too. That way you'll be able to separate the job-related XmitWaits from the topology-related ones.

> The PCIe gen2 bus width is x8 for the QDR chip on the blades. Gen2 provides
> 4Gbps per lane, so x8 would provide 32Gbps, which matches the QDR data
> rate. Or is my math wrong?

No, your math is perfectly alright. :) What I was thinking of was that if there's even a very slight a delay in getting the data off of the IB chip on the destination node, as you're at the limit of the PCIe connection already it won't be made up for by a slightly faster transfer out of the card.

As mentioned by Peter, it might be interesting to pin the IB driver to the closest CPU core to try to limit NUMA issues.

Thanks!
JF

> Thanks for the Oracle document pointer, I guess I should have known about
> it's existence :)
> 
> Thanks for the response, and good to hear from you.
> Florent
> 
> 
> On Fri, Mar 14, 2014 at 6:46 AM, Le Fillatre Jean-Francois <
> jf.lefillatre at univ-lille1.fr> wrote:
> 
> >
> > Hello Florent,
> >
> > I'll go for the congestion explanation too. It's not a major issue as
> > there seems to be no discarded packets from the few lines of data included
> > in the original email. The number of PortXmitWaits is orders of magnitude
> > below the number of PortXmitDatas, and the numbers don't seem to be
> > directly correlated. So it looks like they're not regular events with a
> > clear pattern.
> >
> > Oracle's Infiniband network troubleshooting guide says:
> > PortXmitWait : The number of ticks during which the port selected by
> > PortSelect had data to transmit but no data was sent during the entire tick
> > either because of insufficient credits or because of lack of arbitration.
> >
> > The important part of that sentence is "insufficient credit". The three
> > reasons given by Hal will indeed all cause insufficient credit at some
> > point of the link between two nodes. Any of those may cause the sendind HBA
> > to pause and wait until there is available credit all the way, therefore
> > increasing the PortXmitWait count.
> >
> > From what I remember of your site:
> > - do you still have some links down on some of your cables?
> > - some users do indeed do many-to-one MPI communications
> > - I can't remember if the QDR chip has an x8 or x16 PCIe connection with
> > the board. If it's x8 then the IB chip will be able to saturate the PCIe
> > bus, thus limiting the IB rate at peak use times.
> >
> > Full Oracle document there:
> >
> > http://www.oracle.com/technetwork/database/availability/infiniband-network-troubleshooting-1863251.pdf
> >
> > Thanks,
> > JF
> >
> >
> >
> > On Thursday, March 13, 2014 12:36 CET, Hal Rosenstock <
> > hal.rosenstock at gmail.com> wrote:
> >
> > > Some causes of congestion are: slow receiver, many to one communication,
> > > and "poor" fat tree topology.
> > >
> > > On the last item, are all links in the subnet same speed and width ? How
> > > many links are used going up the fat tree to the next rank ?
> > >
> > > Are all end nodes connected to rank 2 or are any connected to higher
> > rank ?
> > >
> > > Are there any "combined" nodes ? By this I mean, some device which is
> > more
> > > than just single switch or CA. If so, what are they and where do they
> > live
> > > in the topology ?
> > >
> > >
> > > On Wed, Mar 12, 2014 at 11:50 PM, Hal Rosenstock
> > > <hal.rosenstock at gmail.com>wrote:
> > >
> > > > By the fact that you didn't mention PortXmitDiscards, does it mean that
> > > > these are 0 ? Assuming so, PortXmitWait is indicating there is some
> > > > congestion but it has not risen to the level of dropping packets. It's
> > the
> > > > rate of increase of the XmitWait counter that's important rather than
> > the
> > > > absolute number so if you want to chase this, the focus should be on
> > the
> > > > ports most congested.
> > > >
> > > > Since the old tool didn't report XmitWait counters, it's hard to know
> > > > whether this is the same as before or not unless you did this manually.
> > > >
> > > > Was the routing previously fat tree ? Are there any other fat tree
> > related
> > > > log messages in the OpenSM log ? Is there any fat tree configuration of
> > > > compute and/or I/O nodes ?
> > > >
> > > > Any idea on what is the traffic pattern ? Are you running MPI ?
> > > >
> > > > -- Hal
> > > >
> > > >
> > > > On Wed, Mar 12, 2014 at 8:17 PM, Florent Parent <
> > > > florent.parent at calculquebec.ca> wrote:
> > > >
> > > >>
> > > >> Hello IB users,
> > > >>
> > > >> We recently migrated our opensm from 3.2.6 to 3.3.17. In this
> > upgrade, we
> > > >> moved to CentOS6.5 with the stock RDMA and
> > infiniband-diags_1.5.12-5., and
> > > >> running opensm 3.3.17. Routing is FatTree:
> > > >> General fabric topology info
> > > >> ============================
> > > >> - FatTree rank (roots to leaf switches): 3
> > > >> - FatTree max switch rank: 2
> > > >> - Fabric has 966 CAs, 966 CA ports (603 of them CNs), 186 switches
> > > >> - Fabric has 36 switches at rank 0 (roots)
> > > >> - Fabric has 64 switches at rank 1
> > > >> - Fabric has 86 switches at rank 2 (86 of them leafs)
> > > >>
> > > >> Now to the question: ibqueryerrors 1.5.12 is reporting high
> > PortXmitWait
> > > >> values throughout the fabric. We did not see this counter before (it
> > was
> > > >> not reported by the older ibqueryerrors.pl)
> > > >>
> > > >> To give an idea of the scale of the counters, here's a capture of
> > > >> ibqueryerrors --data on one specific I4 switch, 10 seconds after
> > clearing
> > > >> the counters (-k -K):
> > > >>
> > > >> GUID 0x21283a83b30050 port 4:  PortXmitWait == 2932676  PortXmitData
> > ==
> > > >> 90419517 (344.923MB)  PortRcvData == 1526963011 (5.688GB)
> > > >> GUID 0x21283a83b30050 port 5:  PortXmitWait == 3110105  PortXmitData
> > ==
> > > >> 509580912 (1.898GB)  PortRcvData == 13622 (53.211KB)
> > > >> GUID 0x21283a83b30050 port 6:  PortXmitWait == 8696397  PortXmitData
> > ==
> > > >> 480870802 (1.791GB)  PortRcvData == 17067 (66.668KB)
> > > >> GUID 0x21283a83b30050 port 7:  PortXmitWait == 1129568  PortXmitData
> > ==
> > > >> 126483825 (482.497MB)  PortRcvData == 24973385 (95.266MB)
> > > >> GUID 0x21283a83b30050 port 8:  PortXmitWait == 29021  PortXmitData ==
> > > >> 19444902 (74.176MB)  PortRcvData == 84447725 (322.143MB)
> > > >> GUID 0x21283a83b30050 port 9:  PortXmitWait == 4945130  PortXmitData
> > ==
> > > >> 161911244 (617.642MB)  PortRcvData == 27161 (106.098KB)
> > > >> GUID 0x21283a83b30050 port 10:  PortXmitWait == 16795  PortXmitData ==
> > > >> 35572510 (135.698MB)  PortRcvData == 681174731 (2.538GB)
> > > >> ... (this goes on for every active ports)
> > > >>
> > > >> We are not observing any failures, so I suspect that I need help to
> > > >> interpret these numbers. Do I need to be worried?
> > > >>
> > > >> Cheers,
> > > >> Florent
> > > >>
> > > >>
> > > >> _______________________________________________
> > > >> Users mailing list
> > > >> Users at lists.openfabrics.org
> > > >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/users
> > > >>
> > > >>
> > > >
> >
> >
> >
> >