[ofa-general] Catastrophic error on an mthca driver

Tziporet Koren tziporet at dev.mellanox.co.il
Mon Oct 6 05:49:56 PDT 2008


Ramiro Alba Queipo wrote:
> Hi all,
>
> I recently had a problem with the server card of an infiniband cluster
> which in turn made all the fabric down as the opensm daemon had run
> into problems. Running dmesg you could see:
>
> --------------------------------------------------------------------
> [408188.411258] ib_mthca 0000:0c:00.0: Catastrophic error detected:
> internal error
> [408188.411266] ib_mthca 0000:0c:00.0:   buf[00]: 000d0000
> [408188.411269] ib_mthca 0000:0c:00.0:   buf[01]: 00000000
> [408188.411271] ib_mthca 0000:0c:00.0:   buf[02]: 00000000
> [408188.411274] ib_mthca 0000:0c:00.0:   buf[03]: 00000000
> [408188.411276] ib_mthca 0000:0c:00.0:   buf[04]: 00000000
> [408188.411279] ib_mthca 0000:0c:00.0:   buf[05]: 00127e9c
> [408188.411281] ib_mthca 0000:0c:00.0:   buf[06]: ffffffff
> [408188.411283] ib_mthca 0000:0c:00.0:   buf[07]: 00000000
> [408188.411286] ib_mthca 0000:0c:00.0:   buf[08]: 00000000
> [408188.411288] ib_mthca 0000:0c:00.0:   buf[09]: 00000000
> [408188.411290] ib_mthca 0000:0c:00.0:   buf[0a]: 00000000
> [408188.411292] ib_mthca 0000:0c:00.0:   buf[0b]: 00000000
> [408188.411295] ib_mthca 0000:0c:00.0:   buf[0c]: 00000000
> [408188.411297] ib_mthca 0000:0c:00.0:   buf[0d]: 00000000
> [408188.411299] ib_mthca 0000:0c:00.0:   buf[0e]: 00000000
> [408188.411302] ib_mthca 0000:0c:00.0:   buf[0f]: 00000000
> ------------------------------------------------------------
> Problems get solved once I restarted networking. I mean:
>
>   

> Is this a hardware problem? Is there a way to check for a hardware
> problem?
>   
It can be a HW problem. I forward this mail to our support people.
You can also submit a request on our support web: 
http://www.mellanox.com/support/support_signup.php

Tziporet

Tziporet




More information about the general mailing list