<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Word 14 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Tahoma;
panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri","sans-serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
p.MsoAcetate, li.MsoAcetate, div.MsoAcetate
{mso-style-priority:99;
mso-style-link:"Balloon Text Char";
margin:0in;
margin-bottom:.0001pt;
font-size:8.0pt;
font-family:"Tahoma","sans-serif";}
span.EmailStyle17
{mso-style-type:personal;
font-family:"Calibri","sans-serif";
color:windowtext;}
span.EmailStyle18
{mso-style-type:personal;
font-family:"Calibri","sans-serif";
color:#1F497D;}
span.EmailStyle19
{mso-style-type:personal-reply;
font-family:"Calibri","sans-serif";
color:#993366;}
span.BalloonTextChar
{mso-style-name:"Balloon Text Char";
mso-style-priority:99;
mso-style-link:"Balloon Text";
font-family:"Tahoma","sans-serif";}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal"><span style="color:#993366">Thanks Stan.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#993366"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="color:#993366">I didn’t expect you debug the issue I only want to know if you are familiar with this failures.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#993366"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="color:#993366">We will try to debug this issue. If we find any issue we will update the comunity<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#993366"><o:p> </o:p></span></p>
<div>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">From:</span></b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> Smith, Stan [mailto:stan.smith@intel.com]
<br>
<b>Sent:</b> Wednesday, March 16, 2011 6:56 PM<br>
<b>To:</b> Uri Habusha; ofw@lists.openfabrics.org; Gilad Margalit<br>
<b>Cc:</b> Tziporet Koren<br>
<b>Subject:</b> RE: OpenSm issues<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div style="border:none;border-left:solid blue 1.5pt;padding:0in 0in 0in 4.0pt">
<div>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b>From:</b> Uri Habusha [mailto:urih@mellanox.co.il] <br>
<b>Sent:</b> Wednesday, March 16, 2011 5:10 AM<br>
<b>To:</b> Smith, Stan; ofw@lists.openfabrics.org; Gilad Margalit<br>
<b>Cc:</b> Tziporet Koren<br>
<b>Subject:</b> OpenSm issues<o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Hi Stan,<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">In last period we returned to run the regression in debug mode. Each night we encounter many issues with OpenSm. See below 3 different issues that are related to OpenSm.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">I wonder who is responsible to OpenSm? Which testing is done?<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Please advise how to progress with these failures investigation and fix?<o:p></o:p></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">Hello,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"> True I was the likely the last person to touch OpenSM, although at this time I do not have any cycles to address winOFED issues.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">Unfortunately you are on your own debug path.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">Perhaps discussions with the new OFED for Linux OpenSM maintainer Alex Netes [alexne@mellanox.com] might shed some light on the failures?<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">Tzachi and Leo maintained OpenSM long before I became involved.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">As always, a stack trace back without any operational/environmental context is difficult at best to make any sense of.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">W.r.t. OpenSM testing:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">1) all osmtest flavors passed <o:p>
</o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">2) a single OpenSM (multiple Mellanox switches) configuring a 53 node HPC cluster.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">3) Multiple windows OpenSMs tested for master/slave and failover operation.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">4) Multiple Windows and Linux OpenSMs tested for master/slave and failover operation.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">Microsoft HPC validation has used the current OpenSM on larger HPC clusters?<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">Stan.<o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Thanks Uri<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><span style="color:#1F497D">0: kd> kb<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">RetAddr : Args to Child : Call Site<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">00000000`ff3f2c36 : 00000000`000a6f00 00000000`00000000 00000000`00000000 00000000`ff368e60 : ntdll!DbgBreakPoint<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">00000000`ff3ecfbc : 00000000`00602ba0 00000000`006fdde0 00000000`00000001 00000000`74da554c : opensm!osm_vendor_send+0x106 [s:\builds\7523\trunk\ulp\opensm\user\libvendor\osm_vendor_ibumad.c @ 1057]<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">00000000`ff3ed26f : 00000000`000cf7a0 00000000`006fdde0 00000000`00000001 00000000`ff367eb8 : opensm!vl15_send_mad+0x8c [s:\builds\7523\trunk\ulp\opensm\user\opensm\osm_vl15intf.c @ 81]<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">00000000`74db2d3a : 00000000`000cf7a0 00000000`00000000 00000000`00000000 00000000`00000000 : opensm!vl15_poller+0x16f [s:\builds\7523\trunk\ulp\opensm\user\opensm\osm_vl15intf.c @ 151]<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">00000000`76c2be3d : 00000000`000cf7b8 00000000`00000000 00000000`00000000 00000000`00000000 : complibd!cl_thread_callback+0x1a [s:\builds\7523\trunk\core\complib\user\cl_thread.c @ 49]<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">00000000`76d66611 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : kernel32!BaseThreadInitThunk+0xd<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!RtlUserThreadStart+0x1d<o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><span style="color:#1F497D">3: kd> kb<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">RetAddr : Args to Child : Call Site<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">00000000`74fd3c88 : 00000000`0016f748 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!DbgBreakPoint<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">00000000`ff91d1ae : 00000000`0016f748 00000000`001afe10 00000000`00000001 00000000`ff897eb8 : complibd!cl_qlist_remove_head+0x98 [s:\builds\7523\trunk\inc\complib\cl_qlist.h @ 1220]<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">00000000`74fe2d3a : 00000000`0016f700 00000000`00000000 00000000`00000000 00000000`00000000 : opensm!vl15_poller+0xae [s:\builds\7523\trunk\ulp\opensm\user\opensm\osm_vl15intf.c @ 138]<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">00000000`76e6466d : 00000000`0016f718 00000000`00000000 00000000`00000000 00000000`00000000 : complibd!cl_thread_callback+0x1a [s:\builds\7523\trunk\core\complib\user\cl_thread.c @ 49]<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">00000000`76f98791 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : kernel32!BaseThreadInitThunk+0xd<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!RtlUserThreadStart+0x1d<o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><b><i><span style="color:#1F497D">3: kd> kb<o:p></o:p></span></i></b></p>
<p class="MsoNormal"><b><i><span style="color:#1F497D">RetAddr : Args to Child : Call Site<o:p></o:p></span></i></b></p>
<p class="MsoNormal"><b><i><span style="color:#1F497D">00000000`779e7396 : 00000000`00000002 00000001`00000023 00000000`005bd360 00000000`00000003 : ntdll!RtlReportCriticalFailure+0x2f<o:p></o:p></span></i></b></p>
<p class="MsoNormal"><b><i><span style="color:#1F497D">00000000`779e86c2 : 00000000`00000000 000601d8`02b138dc 00000000`00000000 00000000`00000000 : ntdll!RtlpReportHeapFailure+0x26<o:p></o:p></span></i></b></p>
<p class="MsoNormal"><b><i><span style="color:#1F497D">00000000`779ea0c4 : 00000000`005b0000 00000000`00000000 00000000`005bd200 00000000`005bd360 : ntdll!RtlpHeapHandleError+0x12<o:p></o:p></span></i></b></p>
<p class="MsoNormal"><b><i><span style="color:#1F497D">00000000`7797ea00 : 00000000`005b0000 00000000`001b2d30 00000000`005bd270 00000000`0000029c : ntdll!RtlpLogHeapFailure+0xa4<o:p></o:p></span></i></b></p>
<p class="MsoNormal"><b><i><span style="color:#1F497D">00000000`779729ac : 00000000`005b0000 00000001`00000002 00000000`000000e0 00000000`000000f0 : ntdll!RtlpAllocateHeap+0x2105<o:p></o:p></span></i></b></p>
<p class="MsoNormal"><b><i><span style="color:#1F497D">000007fe`ffad1332 : 00000000`00000003 00000000`000000e0 00000000`2821b917 00000000`00000000 : ntdll!RtlAllocateHeap+0x16c<o:p></o:p></span></i></b></p>
<p class="MsoNormal"><b><i><span style="color:#1F497D">00000000`ff6514cc : 00000000`00000000 00000000`00000000 00000000`005bd370 00000000`000000b0 : msvcrt!malloc+0x70<o:p></o:p></span></i></b></p>
<p class="MsoNormal"><b><i><span style="color:#1F497D">00000000`ff6ad144 : 00000000`000ff630 00000000`001b2990 00000000`00000100 00000000`00d4f900 : opensm!osm_mad_pool_get+0x7c [s:\builds\7523\trunk\ulp\opensm\user\opensm\osm_mad_pool.c @ 86]<o:p></o:p></span></i></b></p>
<p class="MsoNormal"><b><i><span style="color:#1F497D">000007fe`f9542a1a : 00000000`001b2940 00000000`00000000 00000000`00000000 00000000`00000000 : opensm!umad_receiver+0x3b4 [s:\builds\7523\trunk\ulp\opensm\user\libvendor\osm_vendor_ibumad.c @ 314]<o:p></o:p></span></i></b></p>
<p class="MsoNormal"><b><i><span style="color:#1F497D">00000000`7771f56d : 00000000`001b2940 00000000`00000000 00000000`00000000 00000000`00000000 : complibd!cl_thread_callback+0x1a [s:\builds\7523\trunk\core\complib\user\cl_thread.c @ 49]<o:p></o:p></span></i></b></p>
<p class="MsoNormal"><b><i><span style="color:#1F497D">00000000`77953281 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : kernel32!BaseThreadInitThunk+0xd<o:p></o:p></span></i></b></p>
<p class="MsoNormal"><b><i><span style="color:#1F497D">00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!RtlUserThreadStart+0x1d<o:p></o:p></span></i></b></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Uri Habusha<o:p></o:p></p>
<p class="MsoNormal">Windows SW Development Lead<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><b>Mellanox Technologies<br>
</b>P.OBox 586, Yokneam 20692<o:p></o:p></p>
<p class="MsoNormal">Israel<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</div>
</body>
</html>