[ofa-general] ib_mthca Catastrophic errors

Roland Dreier rdreier at cisco.com
Fri Jun 5 07:01:00 PDT 2009


 > kernel: ib_mthca 0000:06:00.0: Catastrophic error detected: unknown error
 > kernel: ib_mthca 0000:06:00.0:   buf[00]: ffffffff

Looks like an error on the PCI bus.

 > kernel: ib_mthca 0000:01:00.0: Catastrophic error detected: internal parity error
 > kernel: ib_mthca 0000:01:00.0:   buf[00]: 05000000

probably what it says it is -- a parity error inside the HCA.

Both point to a physical problem to me -- HCA not perfectly seated in
PCI slot, power supply flaky, thermal issue, something like that.

 - R.



More information about the general mailing list