<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
Hi Florent,<br>
<br>
if you have a FT topology, 10 cables can have a VERY big impact on
the fabric overall performance (the fat tree is not very fault
tolerant).<br>
<br>
<br>
<div class="moz-cite-prefix">On 14/03/2014 16:17, Florent Parent
wrote:<br>
</div>
<blockquote
cite="mid:CAF3spKFz10LcRa5xo=9Y2etcHOc5h5g=jEuMX1zbqsDyg5-xdQ@mail.gmail.com"
type="cite">
<meta http-equiv="Content-Type" content="text/html;
charset=ISO-8859-1">
<div dir="ltr"><br>
<div>Hi Jeff,</div>
<div><br>
</div>
<div>I'm collecting data to do analysis over time, and indeed
there are no XmitDiscards. I will add the XmitWait counters to
watch for "hot spots" over time.</div>
<div><br>
</div>
<div>We do have links down in some CXP cables (10 ports total,
all spread out in different cables). I will check if there is
any correlation with the observed XmitWait counters.</div>
<div><br>
</div>
<div>The PCIe gen2 bus width is x8 for the QDR chip on the
blades. Gen2 provides 4Gbps per lane, so x8 would provide
32Gbps, which matches the QDR data rate. Or is my math wrong?</div>
<div><br>
</div>
<div>Thanks for the Oracle document pointer, I guess I should
have known about it's existence :)</div>
<div><br>
</div>
<div>Thanks for the response, and good to hear from you.</div>
<div>Florent</div>
<div class="gmail_extra"><br>
<br>
<div class="gmail_quote">On Fri, Mar 14, 2014 at 6:46 AM, Le
Fillatre Jean-Francois <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:jf.lefillatre@univ-lille1.fr"
target="_blank">jf.lefillatre@univ-lille1.fr</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
Hello Florent,<br>
<br>
I'll go for the congestion explanation too. It's not a
major issue as there seems to be no discarded packets from
the few lines of data included in the original email. The
number of PortXmitWaits is orders of magnitude below the
number of PortXmitDatas, and the numbers don't seem to be
directly correlated. So it looks like they're not regular
events with a clear pattern.<br>
<br>
Oracle's Infiniband network troubleshooting guide says:<br>
PortXmitWait : The number of ticks during which the port
selected by PortSelect had data to transmit but no data
was sent during the entire tick either because of
insufficient credits or because of lack of arbitration.<br>
<br>
The important part of that sentence is "insufficient
credit". The three reasons given by Hal will indeed all
cause insufficient credit at some point of the link
between two nodes. Any of those may cause the sendind HBA
to pause and wait until there is available credit all the
way, therefore increasing the PortXmitWait count.<br>
<br>
From what I remember of your site:<br>
- do you still have some links down on some of your
cables?<br>
- some users do indeed do many-to-one MPI communications<br>
- I can't remember if the QDR chip has an x8 or x16 PCIe
connection with the board. If it's x8 then the IB chip
will be able to saturate the PCIe bus, thus limiting the
IB rate at peak use times.<br>
<br>
Full Oracle document there:<br>
<a moz-do-not-send="true"
href="http://www.oracle.com/technetwork/database/availability/infiniband-network-troubleshooting-1863251.pdf"
target="_blank">http://www.oracle.com/technetwork/database/availability/infiniband-network-troubleshooting-1863251.pdf</a><br>
<br>
Thanks,<br>
JF<br>
<div><br>
<br>
<br>
On Thursday, March 13, 2014 12:36 CET, Hal Rosenstock
<<a moz-do-not-send="true"
href="mailto:hal.rosenstock@gmail.com" target="_blank">hal.rosenstock@gmail.com</a>>
wrote:<br>
<br>
> Some causes of congestion are: slow receiver, many
to one communication,<br>
> and "poor" fat tree topology.<br>
><br>
</div>
> On the last item, are all links in the subnet same
speed and width ? How<br>
<div>> many links are used going up the fat tree to the
next rank ?<br>
><br>
</div>
<div>> Are all end nodes connected to rank 2 or are any
connected to higher rank ?<br>
><br>
</div>
<div>> Are there any "combined" nodes ? By this I mean,
some device which is more<br>
> than just single switch or CA. If so, what are they
and where do they live<br>
> in the topology ?<br>
><br>
><br>
</div>
<div>> On Wed, Mar 12, 2014 at 11:50 PM, Hal Rosenstock<br>
> <<a moz-do-not-send="true"
href="mailto:hal.rosenstock@gmail.com" target="_blank">hal.rosenstock@gmail.com</a>>wrote:<br>
><br>
> > By the fact that you didn't mention
PortXmitDiscards, does it mean that<br>
> > these are 0 ? Assuming so, PortXmitWait is
indicating there is some<br>
> > congestion but it has not risen to the level
of dropping packets. It's the<br>
> > rate of increase of the XmitWait counter
that's important rather than the<br>
> > absolute number so if you want to chase this,
the focus should be on the<br>
> > ports most congested.<br>
> ><br>
> > Since the old tool didn't report XmitWait
counters, it's hard to know<br>
> > whether this is the same as before or not
unless you did this manually.<br>
> ><br>
> > Was the routing previously fat tree ? Are
there any other fat tree related<br>
> > log messages in the OpenSM log ? Is there any
fat tree configuration of<br>
> > compute and/or I/O nodes ?<br>
> ><br>
> > Any idea on what is the traffic pattern ? Are
you running MPI ?<br>
> ><br>
> > -- Hal<br>
> ><br>
> ><br>
> > On Wed, Mar 12, 2014 at 8:17 PM, Florent
Parent <<br>
> > <a moz-do-not-send="true"
href="mailto:florent.parent@calculquebec.ca"
target="_blank">florent.parent@calculquebec.ca</a>>
wrote:<br>
> ><br>
> >><br>
> >> Hello IB users,<br>
> >><br>
</div>
<div>> >> We recently migrated our opensm from
3.2.6 to 3.3.17. In this upgrade, we<br>
</div>
<div>
<div>> >> moved to CentOS6.5 with the stock
RDMA and infiniband-diags_1.5.12-5., and<br>
> >> running opensm 3.3.17. Routing is
FatTree:<br>
> >> General fabric topology info<br>
> >> ============================<br>
> >> - FatTree rank (roots to leaf switches):
3<br>
> >> - FatTree max switch rank: 2<br>
> >> - Fabric has 966 CAs, 966 CA ports (603
of them CNs), 186 switches<br>
> >> - Fabric has 36 switches at rank 0
(roots)<br>
> >> - Fabric has 64 switches at rank 1<br>
> >> - Fabric has 86 switches at rank 2 (86
of them leafs)<br>
> >><br>
> >> Now to the question: ibqueryerrors
1.5.12 is reporting high PortXmitWait<br>
> >> values throughout the fabric. We did not
see this counter before (it was<br>
> >> not reported by the older <a
moz-do-not-send="true"
href="http://ibqueryerrors.pl" target="_blank">ibqueryerrors.pl</a>)<br>
> >><br>
> >> To give an idea of the scale of the
counters, here's a capture of<br>
> >> ibqueryerrors --data on one specific I4
switch, 10 seconds after clearing<br>
> >> the counters (-k -K):<br>
> >><br>
> >> GUID 0x21283a83b30050 port 4:
PortXmitWait == 2932676 PortXmitData ==<br>
> >> 90419517 (344.923MB) PortRcvData ==
1526963011 (5.688GB)<br>
> >> GUID 0x21283a83b30050 port 5:
PortXmitWait == 3110105 PortXmitData ==<br>
> >> 509580912 (1.898GB) PortRcvData ==
13622 (53.211KB)<br>
> >> GUID 0x21283a83b30050 port 6:
PortXmitWait == 8696397 PortXmitData ==<br>
> >> 480870802 (1.791GB) PortRcvData ==
17067 (66.668KB)<br>
> >> GUID 0x21283a83b30050 port 7:
PortXmitWait == 1129568 PortXmitData ==<br>
> >> 126483825 (482.497MB) PortRcvData ==
24973385 (95.266MB)<br>
> >> GUID 0x21283a83b30050 port 8:
PortXmitWait == 29021 PortXmitData ==<br>
> >> 19444902 (74.176MB) PortRcvData ==
84447725 (322.143MB)<br>
> >> GUID 0x21283a83b30050 port 9:
PortXmitWait == 4945130 PortXmitData ==<br>
> >> 161911244 (617.642MB) PortRcvData ==
27161 (106.098KB)<br>
> >> GUID 0x21283a83b30050 port 10:
PortXmitWait == 16795 PortXmitData ==<br>
> >> 35572510 (135.698MB) PortRcvData ==
681174731 (2.538GB)<br>
> >> ... (this goes on for every active
ports)<br>
> >><br>
> >> We are not observing any failures, so I
suspect that I need help to<br>
> >> interpret these numbers. Do I need to be
worried?<br>
> >><br>
> >> Cheers,<br>
> >> Florent<br>
> >><br>
> >><br>
> >>
_______________________________________________<br>
> >> Users mailing list<br>
> >> <a moz-do-not-send="true"
href="mailto:Users@lists.openfabrics.org"
target="_blank">Users@lists.openfabrics.org</a><br>
> >> <a moz-do-not-send="true"
href="http://lists.openfabrics.org/cgi-bin/mailman/listinfo/users"
target="_blank">http://lists.openfabrics.org/cgi-bin/mailman/listinfo/users</a><br>
> >><br>
> >><br>
> ><br>
<br>
<br>
<br>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
Users mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Users@lists.openfabrics.org">Users@lists.openfabrics.org</a>
<a class="moz-txt-link-freetext" href="http://lists.openfabrics.org/cgi-bin/mailman/listinfo/users">http://lists.openfabrics.org/cgi-bin/mailman/listinfo/users</a>
</pre>
</blockquote>
<br>
<pre class="moz-signature" cols="72">--
---
Mehdi Denou
</pre>
</body>
</html>