[openib-general] Catastrophic error detected.
Dotan Barak
dotanb at dev.mellanox.co.il
Thu Oct 19 08:26:17 PDT 2006
Hi Ira.
Ira Weiny wrote:
> I got the following error running with OFED 1.1 on a modified 2.6.9 RHEL4
> kernel. Hal mentioned that there might be a catastrophic error recovery patch
> submitted since then? I can't find a mention of that in the mailing list. If
> possible I would like to try such a patch.
>
> Thanks,
> Ira
>
> 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: Catastrophic error detected: unknown error
> 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[00]: ffffffff
> 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[01]: ffffffff
> 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[02]: ffffffff
> 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[03]: ffffffff
> 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[04]: ffffffff
> 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[05]: ffffffff
> 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[06]: ffffffff
> 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[07]: ffffffff
> 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[08]: ffffffff
> 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[09]: ffffffff
> 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[0a]: ffffffff
> 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[0b]: ffffffff
> 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[0c]: ffffffff
> 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[0d]: ffffffff
> 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[0e]: ffffffff
> 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[0f]: ffffffff
>
> # rhea277 /root > /sbin/lspci -vv -s 07:00.0
> 07:00.0 InfiniBand: Mellanox Technologies MT25208 InfiniHost III Ex (rev 20)
> Subsystem: Mellanox Technologies MT25208 InfiniHost III Ex
> Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
> Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
> Interrupt: pin A routed to IRQ 217
> Region 0: Memory at dff00000 (64-bit, non-prefetchable) [disabled] [size=1M]
> Region 2: Memory at de800000 (64-bit, prefetchable) [disabled] [size=8M]
> Capabilities: [40] Power Management version 2
> Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
> Status: D0 PME-Enable- DSel=0 DScale=0 PME-
> Capabilities: [48] Vital Product Data
> Capabilities: [90] Message Signalled Interrupts: 64bit+ Queue=0/5 Enable-
> Address: 0000000000000000 Data: 0000
> Capabilities: [84] MSI-X: Enable- Mask- TabSize=32
> Vector table: BAR=0 offset=00082000
> PBA: BAR=0 offset=00082200
> Capabilities: [60] Express Endpoint IRQ 0
> Device: Supported: MaxPayload 128 bytes, PhantFunc 0, ExtTag-
> Device: Latency L0s <64ns, L1 unlimited
> Device: AtnBtn- AtnInd- PwrInd-
> Device: Errors: Correctable- Non-Fatal- Fatal- Unsupported-
> Device: RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
> Device: MaxPayload 128 bytes, MaxReadReq 512 bytes
> Link: Supported Speed 2.5Gb/s, Width x8, ASPM L0s, Port 8
> Link: Latency L0s unlimited, L1 unlimited
> Link: ASPM Disabled RCB 64 bytes CommClk- ExtSynch-
> Link: Speed 2.5Gb/s, Width x8
>
can you please give me some info on how you got this error:
* what did you do that caused this error?
* which FW version do you have?
* what is the board_id of the HCA? (you can find this info using
ibv_devinfo)
thanks
Dotan
More information about the general
mailing list