***SPAM*** Re: [ofa-general] ***SPAM*** Re: mlx4_core 0000:c3:00.0: SW2HW_MPT failed (-16) (dmesg)
Phillip Wilson
phillipwils at gmail.com
Tue Mar 10 09:54:48 PDT 2009
I worked around the mlx4_cmd_poll() device driver call timing out by
implementing retries on the libverbs function call ibv_reg_mr(). So far,
only one retry has been observed.
I did not disable the "msi_x" driver parameter. Going forward, the PCIe
systems that I will use implement only MSI for interrupts.
i.e.
while (retry < MAX_RETRIES) {
ctx->mr = ibv_reg_mr(ctx->pd, ctx->buf, size, IBV_ACCESS_LOCAL_WRITE);
if (!ctx->mr) {
retry++;
printf("retry[%d]\n", retry);
sleep(1);
}
}
if (retry == MAX_RETRIES) {
printf("Couldn't register MR");
return NULL;
}
Thanks,
Phillip
On Sun, Mar 8, 2009 at 12:45 AM, Nicolas Morey-Chaisemartin <
devel at morey-chaisemartin.com> wrote:
> Phillip Wilson a écrit :
> > I updated the HCA "InfiniBand: Mellanox Technologies: Unknown device
> > 634a (rev a0)" to the latest firmware and issue remains. "fw_ver" is
> > now 2.6.000.
> >
> > Any ideas on why the time out is occuring in the function?
> >
> >
> >
>
> I've seen this problem couples of times.
> Something on your system (probably HCA) is in a crappy state and won't
> answer.
>
> I had the problem (with top spin HCA and old OFED stacks on IA64 system).
> It was due to the driver trying to use msi_x and it failed.
> Try in your modprobe.conf file to force msi=0 msi_x=0 for mlx4_core module,
> it may help :)
>
>
> Nicolas
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20090310/2f033189/attachment.html>
More information about the general
mailing list