<div>I worked around the mlx4_cmd_poll() device driver call timing out by implementing retries on the libverbs function call ibv_reg_mr(). So far, only one retry has been observed. </div>
<div> </div>
<div>I did not disable the "msi_x" driver parameter. Going forward, the PCIe systems that I will use implement only MSI for interrupts.</div>
<div> </div>
<div>i.e. </div>
<div> </div>
<div>while (retry < MAX_RETRIES) {</div>
<div> ctx->mr = ibv_reg_mr(ctx->pd, ctx->buf, size, IBV_ACCESS_LOCAL_WRITE);</div>
<div> if (!ctx->mr) {</div>
<div> retry++;</div>
<div> printf("retry[%d]\n", retry);</div>
<div> sleep(1);</div>
<div> }</div>
<div>}</div>
<div>if (retry == MAX_RETRIES) {</div>
<div> printf("Couldn't register MR");</div>
<div> return NULL;</div>
<div>}</div>
<div> </div>
<div>Thanks,</div>
<div>Phillip<br><br></div>
<div class="gmail_quote">On Sun, Mar 8, 2009 at 12:45 AM, Nicolas Morey-Chaisemartin <span dir="ltr"><<a href="mailto:devel@morey-chaisemartin.com">devel@morey-chaisemartin.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0px 0px 0px 0.8ex; BORDER-LEFT: #ccc 1px solid">Phillip Wilson a écrit :<br>> I updated the HCA "InfiniBand: Mellanox Technologies: Unknown device<br>
> 634a (rev a0)" to the latest firmware and issue remains. "fw_ver" is<br>> now 2.6.000.<br>><br>> Any ideas on why the time out is occuring in the function?<br>><br>><br>><br><br>I've seen this problem couples of times.<br>
Something on your system (probably HCA) is in a crappy state and won't answer.<br><br>I had the problem (with top spin HCA and old OFED stacks on IA64 system). It was due to the driver trying to use msi_x and it failed.<br>
Try in your modprobe.conf file to force msi=0 msi_x=0 for mlx4_core module, it may help :)<br><font color="#888888"><br><br>Nicolas<br><br></font></blockquote></div><br>