[Users] PortXmitWait?

Fri Mar 14 03:46:01 PDT 2014

Hello Florent,

I'll go for the congestion explanation too. It's not a major issue as there seems to be no discarded packets from the few lines of data included in the original email. The number of PortXmitWaits is orders of magnitude below the number of PortXmitDatas, and the numbers don't seem to be directly correlated. So it looks like they're not regular events with a clear pattern.

Oracle's Infiniband network troubleshooting guide says:
PortXmitWait : The number of ticks during which the port selected by PortSelect had data to transmit but no data was sent during the entire tick either because of insufficient credits or because of lack of arbitration.

The important part of that sentence is "insufficient credit". The three reasons given by Hal will indeed all cause insufficient credit at some point of the link between two nodes. Any of those may cause the sendind HBA to pause and wait until there is available credit all the way, therefore increasing the PortXmitWait count.

>From what I remember of your site:
- do you still have some links down on some of your cables?
- some users do indeed do many-to-one MPI communications
- I can't remember if the QDR chip has an x8 or x16 PCIe connection with the board. If it's x8 then the IB chip will be able to saturate the PCIe bus, thus limiting the IB rate at peak use times.

Full Oracle document there:
 http://www.oracle.com/technetwork/database/availability/infiniband-network-troubleshooting-1863251.pdf

Thanks,
JF

On Thursday, March 13, 2014 12:36 CET, Hal Rosenstock <hal.rosenstock at gmail.com> wrote: 

> Some causes of congestion are: slow receiver, many to one communication,
> and "poor" fat tree topology.
> 
> On the last item, are all links in the subnet same speed and width ? How
> many links are used going up the fat tree to the next rank ?
> 
> Are all end nodes connected to rank 2 or are any connected to higher rank ?
> 
> Are there any "combined" nodes ? By this I mean, some device which is more
> than just single switch or CA. If so, what are they and where do they live
> in the topology ?
> 
> 
> On Wed, Mar 12, 2014 at 11:50 PM, Hal Rosenstock
> <hal.rosenstock at gmail.com>wrote:
> 
> > By the fact that you didn't mention PortXmitDiscards, does it mean that
> > these are 0 ? Assuming so, PortXmitWait is indicating there is some
> > congestion but it has not risen to the level of dropping packets. It's the
> > rate of increase of the XmitWait counter that's important rather than the
> > absolute number so if you want to chase this, the focus should be on the
> > ports most congested.
> >
> > Since the old tool didn't report XmitWait counters, it's hard to know
> > whether this is the same as before or not unless you did this manually.
> >
> > Was the routing previously fat tree ? Are there any other fat tree related
> > log messages in the OpenSM log ? Is there any fat tree configuration of
> > compute and/or I/O nodes ?
> >
> > Any idea on what is the traffic pattern ? Are you running MPI ?
> >
> > -- Hal
> >
> >
> > On Wed, Mar 12, 2014 at 8:17 PM, Florent Parent <
> > florent.parent at calculquebec.ca> wrote:
> >
> >>
> >> Hello IB users,
> >>
> >> We recently migrated our opensm from 3.2.6 to 3.3.17. In this upgrade, we
> >> moved to CentOS6.5 with the stock RDMA and infiniband-diags_1.5.12-5., and
> >> running opensm 3.3.17. Routing is FatTree:
> >> General fabric topology info
> >> ============================
> >> - FatTree rank (roots to leaf switches): 3
> >> - FatTree max switch rank: 2
> >> - Fabric has 966 CAs, 966 CA ports (603 of them CNs), 186 switches
> >> - Fabric has 36 switches at rank 0 (roots)
> >> - Fabric has 64 switches at rank 1
> >> - Fabric has 86 switches at rank 2 (86 of them leafs)
> >>
> >> Now to the question: ibqueryerrors 1.5.12 is reporting high PortXmitWait
> >> values throughout the fabric. We did not see this counter before (it was
> >> not reported by the older ibqueryerrors.pl)
> >>
> >> To give an idea of the scale of the counters, here's a capture of
> >> ibqueryerrors --data on one specific I4 switch, 10 seconds after clearing
> >> the counters (-k -K):
> >>
> >> GUID 0x21283a83b30050 port 4:  PortXmitWait == 2932676  PortXmitData ==
> >> 90419517 (344.923MB)  PortRcvData == 1526963011 (5.688GB)
> >> GUID 0x21283a83b30050 port 5:  PortXmitWait == 3110105  PortXmitData ==
> >> 509580912 (1.898GB)  PortRcvData == 13622 (53.211KB)
> >> GUID 0x21283a83b30050 port 6:  PortXmitWait == 8696397  PortXmitData ==
> >> 480870802 (1.791GB)  PortRcvData == 17067 (66.668KB)
> >> GUID 0x21283a83b30050 port 7:  PortXmitWait == 1129568  PortXmitData ==
> >> 126483825 (482.497MB)  PortRcvData == 24973385 (95.266MB)
> >> GUID 0x21283a83b30050 port 8:  PortXmitWait == 29021  PortXmitData ==
> >> 19444902 (74.176MB)  PortRcvData == 84447725 (322.143MB)
> >> GUID 0x21283a83b30050 port 9:  PortXmitWait == 4945130  PortXmitData ==
> >> 161911244 (617.642MB)  PortRcvData == 27161 (106.098KB)
> >> GUID 0x21283a83b30050 port 10:  PortXmitWait == 16795  PortXmitData ==
> >> 35572510 (135.698MB)  PortRcvData == 681174731 (2.538GB)
> >> ... (this goes on for every active ports)
> >>
> >> We are not observing any failures, so I suspect that I need help to
> >> interpret these numbers. Do I need to be worried?
> >>
> >> Cheers,
> >> Florent
> >>
> >>
> >> _______________________________________________
> >> Users mailing list
> >> Users at lists.openfabrics.org
> >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/users
> >>
> >>
> >