***SPAM*** Re: [ofa-general] ***SPAM*** Re: mlx4_core 0000:c3:00.0: SW2HW_MPT failed (-16) (dmesg)

Phillip Wilson phillipwils at gmail.com
Tue Mar 10 09:54:48 PDT 2009


I worked around the mlx4_cmd_poll() device driver call timing out by
implementing retries on the libverbs function call ibv_reg_mr().  So far,
only one retry has been observed.

I did not disable the "msi_x" driver parameter.  Going forward, the PCIe
systems that I will use implement only MSI for interrupts.

i.e.

while (retry < MAX_RETRIES) {
    ctx->mr = ibv_reg_mr(ctx->pd, ctx->buf, size, IBV_ACCESS_LOCAL_WRITE);
    if (!ctx->mr) {
        retry++;
        printf("retry[%d]\n", retry);
        sleep(1);
    }
}
if (retry == MAX_RETRIES) {
    printf("Couldn't register MR");
    return NULL;
}

Thanks,
Phillip

On Sun, Mar 8, 2009 at 12:45 AM, Nicolas Morey-Chaisemartin <
devel at morey-chaisemartin.com> wrote:

> Phillip Wilson a écrit :
> > I updated the HCA "InfiniBand: Mellanox Technologies: Unknown device
> > 634a (rev a0)" to the latest firmware and issue remains.  "fw_ver" is
> > now 2.6.000.
> >
> > Any ideas on why the time out is occuring in the function?
> >
> >
> >
>
> I've seen this problem couples of times.
> Something on your system (probably HCA) is in a crappy state and won't
> answer.
>
> I had the problem (with top spin HCA and old OFED stacks on IA64 system).
> It was due to the driver trying to use msi_x and it failed.
> Try in your modprobe.conf file to force msi=0 msi_x=0 for mlx4_core module,
> it may help :)
>
>
> Nicolas
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20090310/2f033189/attachment.html>


More information about the general mailing list