<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:Wingdings;
panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Tahoma;
panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman",serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
p.MsoAcetate, li.MsoAcetate, div.MsoAcetate
{mso-style-priority:99;
mso-style-link:"Balloon Text Char";
margin:0in;
margin-bottom:.0001pt;
font-size:8.0pt;
font-family:"Tahoma",sans-serif;}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
{mso-style-priority:34;
margin-top:0in;
margin-right:0in;
margin-bottom:0in;
margin-left:.5in;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman",serif;}
span.BalloonTextChar
{mso-style-name:"Balloon Text Char";
mso-style-priority:99;
mso-style-link:"Balloon Text";
font-family:"Tahoma",sans-serif;}
span.EmailStyle20
{mso-style-type:personal;
font-family:"Calibri",sans-serif;
color:#1F497D;}
span.EmailStyle21
{mso-style-type:personal;
font-family:"Calibri",sans-serif;
color:#1F497D;}
span.EmailStyle22
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:#1F497D;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
/* List Definitions */
@list l0
{mso-list-id:326828429;
mso-list-type:hybrid;
mso-list-template-ids:1873813820 67698703 67698713 67698715 67698703 67698713 67698715 67698703 67698713 67698715;}
@list l0:level1
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level2
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level3
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
@list l0:level4
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level5
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level6
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
@list l0:level7
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level8
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level9
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
@list l1
{mso-list-id:520321833;
mso-list-type:hybrid;
mso-list-template-ids:1729264034 67698703 67698691 67698693 67698689 67698691 67698693 67698689 67698691 67698693;}
@list l1:level1
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l1:level2
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:"Courier New";}
@list l1:level3
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Wingdings;}
@list l1:level4
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Symbol;}
@list l1:level5
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:"Courier New";}
@list l1:level6
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Wingdings;}
@list l1:level7
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Symbol;}
@list l1:level8
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:"Courier New";}
@list l1:level9
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Wingdings;}
ol
{margin-bottom:0in;}
ul
{margin-bottom:0in;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">I would first check the status of irqbalance to see if it is running.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">Is it is, I would look at the contents of /etc/sysconfig/irqbalance.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">More recent irqbalance versions (including the one in RHEL/CENTOS 6.8) ignore affinity hints, but allow an override to obey them.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">Try adding this to the config file and restarting irqbalance:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">#<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"># IRQBALANCE_ARGS<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"># append any args here to the irqbalance daemon as documented in the man page<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">#<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">IRQBALANCE_ARGS=--hintpolicy=exact<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">Mike<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><a name="_MailEndCompose"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></a></p>
<div style="border:none;border-left:solid blue 1.5pt;padding:0in 0in 0in 4.0pt">
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><a name="_____replyseparator"></a><b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">From:</span></b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"> Rimmer, Todd
<br>
<b>Sent:</b> Thursday, August 18, 2016 7:42 AM<br>
<b>To:</b> Woodruff, Robert J <robert.j.woodruff@intel.com>; nv8840@rit.edu; interop-wg@lists.openfabrics.org; ofiwg@lists.openfabrics.org; tac@lists.openfabrics.org; ewg@lists.openfabrics.org<br>
<b>Cc:</b> Marciniszyn, Mike <mike.marciniszyn@intel.com><br>
<b>Subject:</b> RE: [tac] [ofiwg] OFED compatibility issue with Qlogic Infiniband card<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">This error means the combination of timeout/retry configured for the QP has been exceeded. It could mean:<o:p></o:p></span></p>
<p class="MsoListParagraph" style="text-indent:-.25in;mso-list:l1 level1 lfo2"><![if !supportLists]><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><span style="mso-list:Ignore">1.<span style="font:7.0pt "Times New Roman"">
</span></span></span><![endif]><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">The destination does not exist on the IB fabric<o:p></o:p></span></p>
<p class="MsoListParagraph" style="text-indent:-.25in;mso-list:l1 level1 lfo2"><![if !supportLists]><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><span style="mso-list:Ignore">2.<span style="font:7.0pt "Times New Roman"">
</span></span></span><![endif]><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">The destination address was not properly resolved and the wrong IB address is being used<o:p></o:p></span></p>
<p class="MsoListParagraph" style="text-indent:-.25in;mso-list:l1 level1 lfo2"><![if !supportLists]><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><span style="mso-list:Ignore">3.<span style="font:7.0pt "Times New Roman"">
</span></span></span><![endif]><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">The fabric path between this host and the remote host is not stable or has a high symbol error rate<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">If this message occurs at the start of the job, it is likely #1 or #2. If it occurs later in the job after some traffic has been successfully sent between those
nodes, #3 is more likely.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">Since you indicate it is affecting performance, I assume it occurs mid-job, so #3 is more likely. In which case you should use tools like ibping and ibdiagnet
to analyze the errors in the fabric or better yet you can use the Intel True Scale IB Fabric Suite (contact your HW supplier or Intel if you do not have this, it includes a rich set of fabric analysis and diagnostic tools)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">Once you resolve this connectivity issue, for the best MPI performance on QLogic Infiniband cards it is recommended to use openmpi’s psm mtl as opposed to the
verbs btl.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">If the problem is #3, you should work with the distributor you purchased your hardware from for further debug of the faulty component.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<div>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial",sans-serif;color:#1F497D">Todd Rimmer<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New";color:#1F497D">DCSG Architecture</span><span style="color:#1F497D"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New";color:#1F497D">Voice: 610-312-2152 Fax: 610-312-2233<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New";color:#1F497D"><a href="mailto:Todd.Rimmer@intel.com">Todd.Rimmer@intel.com</a></span><span style="color:#1F497D"><o:p></o:p></span></p>
</div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<div style="border:none;border-left:solid blue 1.5pt;padding:0in 0in 0in 4.0pt">
<div>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:"Tahoma",sans-serif">From:</span></b><span style="font-size:10.0pt;font-family:"Tahoma",sans-serif"> tac [<a href="mailto:tac-bounces@lists.openfabrics.org">mailto:tac-bounces@lists.openfabrics.org</a>]
<b>On Behalf Of </b>Woodruff, Robert J<br>
<b>Sent:</b> Monday, August 15, 2016 12:28 PM<br>
<b>To:</b> <a href="mailto:nv8840@rit.edu">nv8840@rit.edu</a>; <a href="mailto:interop-wg@lists.openfabrics.org">
interop-wg@lists.openfabrics.org</a>; <a href="mailto:ofiwg@lists.openfabrics.org">
ofiwg@lists.openfabrics.org</a>; <a href="mailto:tac@lists.openfabrics.org">tac@lists.openfabrics.org</a>;
<a href="mailto:ewg@lists.openfabrics.org">ewg@lists.openfabrics.org</a><br>
<b>Cc:</b> Marciniszyn, Mike<br>
<b>Subject:</b> Re: [tac] [ofiwg] OFED compatibility issue with Qlogic Infiniband card<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">+ Mike from the Intel InfiniBand driver team.
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<div style="border:none;border-left:solid blue 1.5pt;padding:0in 0in 0in 4.0pt">
<div>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:"Tahoma",sans-serif">From:</span></b><span style="font-size:10.0pt;font-family:"Tahoma",sans-serif"> ofiwg [<a href="mailto:ofiwg-bounces@lists.openfabrics.org">mailto:ofiwg-bounces@lists.openfabrics.org</a>]
<b>On Behalf Of </b><a href="mailto:nv8840@rit.edu">nv8840@rit.edu</a><br>
<b>Sent:</b> Monday, August 15, 2016 7:50 AM<br>
<b>To:</b> <a href="mailto:interop-wg@lists.openfabrics.org">interop-wg@lists.openfabrics.org</a>;
<a href="mailto:ofiwg@lists.openfabrics.org">ofiwg@lists.openfabrics.org</a>; <a href="mailto:tac@lists.openfabrics.org">
tac@lists.openfabrics.org</a>; <a href="mailto:ewg@lists.openfabrics.org">ewg@lists.openfabrics.org</a><br>
<b>Subject:</b> [ofiwg] OFED compatibility issue with Qlogic Infiniband card<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal">Hi,<o:p></o:p></p>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">I am trying to work with CentOS 6.8 and Qlogic Corp. IBA6110 Infiniband HCA(rev 3) while testing I receive following error message:<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:9.5pt">[[23581,1],4][btl_openib_component.c:3369:handle_wc] from n003 to: 192.168.2.5 error polling LP CQ with status RETRY EXCEEDED ERROR status number 12 for wr_id 7057fe8 opcode 2 vendor error 0 qp_idx 3</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:9.5pt">and increasing btl_openib_ib_timeout had no effect.</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:9.5pt">I am not sure what it means and how can it be resolved as it is affecting my performance. I am using infiniband support package from centos 6.8 for configuration of drivers.</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:9.5pt">Thanking you,</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:9.5pt">Regards,</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:9.5pt">Nikhil</span><o:p></o:p></p>
</div>
</div>
</div>
</div>
</div>
</div>
</body>
</html>