<meta http-equiv="Content-Type" content="text/html; charset=utf-8"><meta name="ProgId" content="Word.Document"><meta name="Generator" content="Microsoft Word 11"><meta name="Originator" content="Microsoft Word 11"><link rel="File-List" href="file:///C:%5CDOCUME%7E1%5Cphwilson%5CLOCALS%7E1%5CTemp%5Cmsohtml1%5C01%5Cclip_filelist.xml"><style>
<!--
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{mso-style-parent:"";
margin:0in;
margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:12.0pt;
font-family:"Times New Roman";
mso-fareast-font-family:"Times New Roman";}
@page Section1
{size:8.5in 11.0in;
margin:1.0in 1.25in 1.0in 1.25in;
mso-header-margin:.5in;
mso-footer-margin:.5in;
mso-paper-source:0;}
div.Section1
{page:Section1;}
-->
</style>
<p class="MsoNormal">The “ibv_reg_mr()” function call fails with HCA (DID=0x634A)
that uses the mlx4_0 driver when the system is under load (memory and cpu).<span style=""> </span>The system usually has over 500MB of system
memory when “ibv_reg_mr()” call fails.<span style=""> </span><span style=""> </span></p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">If I only run one HCA with (DID=0x6278) that uses the mthca0
driver with the other tools to generate stress the “ibv_reg_mr()” call always
passes.<span style=""> </span>If I only run the HCA with
(DID=0x634A) with the other tools to generate stress the “ibv_reg_mr()” call
will always fails; it usually takes less than 30 minutes for the failure to
occur.<span style=""> </span></p>
<p class="MsoNormal"> </p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">The maximum number of memory regions requested at one time
is up to 8 (32MB) with two HCA dual port cards and the maximum size for a
memory region is 1 MB.</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">(i.e. ctx->mr = ibv_reg_mr(ctx->pd, </p>
<p class="MsoNormal"><span style=""> </span><span style=""> </span><span style=""> </span>buffer, <span style=""> </span>/*malloc 4MB buffer per process*/ </p>
<p class="MsoNormal"><span style=""> </span><span style=""> </span>size,<span style=""> </span>/*2 Bytes to 1MB */</p>
<p class="MsoNormal"><span style=""> </span><span style=""> </span>IBV_ACCESS_LOCAL_WRITE);</p>
<p class="MsoNormal">)</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">I modified the ibv_rc_pingpong test to use the parent-child paradigm
instead of the current client/server approach for my environment.<span style=""> </span>The code forks a parent process and a child
process per port which serves the same purpose as the current client/server
approach.<span style=""> </span>The code also forks a process
to run on a HCA.<span style=""> </span>Basically, the same
code is executed on each HCA except for the user libraries (libmlx4.so, libmthca.so),
mlx4.ko, mthca.ko and firmware on each HCA.</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">Since the code in the user libraries is very similar to each
other, I suspect the issue is in the kernel code or HCA firmware.</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">Does any one know what kernel patch fixes this issue starting
from kernel 2.6.24 through 2.6.28? Has anyone else seen this issue?<br></p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">System Information:</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">The system has 4GB of memory.</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">uname -a</p>
<p class="MsoNormal">Linux (none) 2.6.24.02.02.08 #21 SMP Thu Feb 19 11:04:35 PST
2009 ia64 unknown</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">OFED 1.2.5</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal"><span style="" lang="FR">lspci -d 15b3:</span></p>
<p class="MsoNormal"><span style="" lang="FR"> </span></p>
<p class="MsoNormal"><span style="" lang="FR">0000:10:00.0
InfiniBand: Mellanox Technologies MT25208 InfiniHost III Ex (Tavor
compatibility mode) (rev 20)</span></p>
<p class="MsoNormal">0000:c3:00.0 InfiniBand: Mellanox Technologies: Unknown
device 634a (rev a0)</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal"><span style="" lang="PT-BR">lspci -d
15b3: -n</span></p>
<p class="MsoNormal"><span style="" lang="PT-BR">0000:10:00.0
0c06: 15b3:6278 (rev 20)</span></p>
<p class="MsoNormal">0000:c3:00.0 0c06: 15b3:634a (rev a0)</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal"><span style="" lang="PT-BR">ibv_devinfo
-v</span></p>
<p class="MsoNormal"><span style="" lang="PT-BR">hca_id:
mlx4_0</span></p>
<p class="MsoNormal"><span style="" lang="PT-BR"><span style=""> </span></span>fw_ver:<span style=""> </span>2.5.000</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal"><span style="" lang="ES">hca_id: mthca0</span></p>
<p class="MsoNormal"><span style="" lang="ES"><span style=""> </span>fw_ver:<span style=""> </span>4.8.930</span></p>