[ofa-general] ***SPAM*** ib_mthca 0000:06:00.0: Catastrophic error detected: internal error

Phillip Wilson phillipwils at gmail.com
Wed Mar 11 18:18:57 PDT 2009


I looked through the ofa-general mail archives and found this issue in the
November 6 through 10, 2008 archives.  According to the mail thread, the
issue was fix by upgrading the firmware.  I have the latest posted 1.2.0
firmware, but I am running into the "ib_mthca: Catastrophic error detected:
internal error"  issue.  The cards are reset when this issue occurs and the
LIDs ( sm_lid, port_lid) are reset to 0.

#> ibv_devinfo
hca_id: mthca2
        fw_ver:                         1.2.0
        node_guid:                      0019:bbff:fff8:8184
        sys_image_guid:                 0019:bbff:fff8:8187
        vendor_id:                      0x02c9
        vendor_part_id:                 25204
        hw_ver:                         0xA0
        board_id:                       HP_0010000001
        phys_port_cnt:                  1
                port:   1
                        state:                  PORT_DOWN (1)
                        max_mtu:                2048 (4)
                        active_mtu:             512 (2)
                        sm_lid:                 1
                        port_lid:               1
                        port_lmc:               0x00

hca_id: mthca1
        fw_ver:                         1.2.0
        node_guid:                      0019:bbff:fff7:3c40
        sys_image_guid:                 0019:bbff:fff7:3c43
        vendor_id:                      0x02c9
        vendor_part_id:                 25204
        hw_ver:                         0xA0
        board_id:                       HP_0010000001
        phys_port_cnt:                  1
                port:   1
                        state:                  PORT_DOWN (1)
                        max_mtu:                2048 (4)
                        active_mtu:             512 (2)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00

hca_id: mthca0
        fw_ver:                         1.2.0
        node_guid:                      0019:bbff:fff7:4b10
        sys_image_guid:                 0019:bbff:fff7:4b13
        vendor_id:                      0x02c9
        vendor_part_id:                 25204
        hw_ver:                         0xA0
        board_id:                       HP_0010000001
        phys_port_cnt:                  1
                port:   1
                        state:                  PORT_DOWN (1)
                        max_mtu:                2048 (4)
                        active_mtu:             512 (2)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00

#> dmesg
M95700A6) rev 2100 PHY(serdes)] (PCIX:66MHz:64-bit) 1000Base-SX Ethernet
00:1b:78:7c:11:b7
eth3: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] WireSpeed[0] TSOcap[1]
eth3: dma_rwctrl[769f0000] dma_mask[64-bit]
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
Probing IDE interface ide0...
Probing IDE interface ide1...
Probing IDE interface ide2...
Probing IDE interface ide3...
Loading iSCSI transport class v2.0-724.
QLogic Fibre Channel HBA Driver
iscsi: registered transport (qla4xxx)
QLogic iSCSI HBA Driver
Emulex LightPulse Fibre Channel SCSI driver 8.2.2
Copyright(c) 2004-2007 Emulex.  All rights reserved.
Driver 'sd' needs updating - please use bus_type methods
Driver 'sr' needs updating - please use bus_type methods
Fusion MPT base driver 3.04.06
Copyright (c) 1999-2007 LSI Corporation
Fusion MPT SPI Host driver 3.04.06
Fusion MPT FC Host driver 3.04.06
Fusion MPT SAS Host driver 3.04.06
GSI 38 (level, low) -> CPU 10 (0x0a00) vector 61
ACPI: PCI Interrupt 0000:02:01.0[A] -> GSI 38 (level, low) -> IRQ 61
mptbase: ioc0: Initiating bringup
ioc0: LSISAS1068 B0: Capabilities={Initiator}
scsi0 : ioc0: LSISAS1068 B0, FwRev=01172100h, Ports=1, MaxQ=163, IRQ=61
scsi 0:0:0:0: Direct-Access     HP       DG146ABAB4       HPD5 PQ: 0 ANSI: 5
sd 0:0:0:0: [sda] 286749488 512-byte hardware sectors (146816 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: db 00 10 08
sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, supports DPO
and FUA
sd 0:0:0:0: [sda] 286749488 512-byte hardware sectors (146816 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: db 00 10 08
sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, supports DPO
and FUA
 sda: sda1 sda2 sda3
sd 0:0:0:0: [sda] Attached SCSI disk
sd 0:0:0:0: Attached scsi generic sg0 type 0
GSI 19 (level, low) -> CPU 11 (0x0b00) vector 62
ACPI: PCI Interrupt 0000:00:02.2[C] -> GSI 19 (level, low) -> IRQ 62
ehci_hcd 0000:00:02.2: EHCI Host Controller
ehci_hcd 0000:00:02.2: new USB bus registered, assigned bus number 1
ehci_hcd 0000:00:02.2: irq 62, io mem 0x88030000
ehci_hcd 0000:00:02.2: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004
usb usb1: configuration #1 chosen from 1 choice
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 5 ports detected
ohci_hcd: 2006 August 04 USB 1.1 'Open' Host Controller (OHCI) Driver
GSI 17 (level, low) -> CPU 12 (0x0c00) vector 63
ACPI: PCI Interrupt 0000:00:02.0[A] -> GSI 17 (level, low) -> IRQ 63
ohci_hcd 0000:00:02.0: OHCI Host Controller
ohci_hcd 0000:00:02.0: new USB bus registered, assigned bus number 2
ohci_hcd 0000:00:02.0: irq 63, io mem 0x88032000
usb usb2: configuration #1 chosen from 1 choice
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 3 ports detected
GSI 18 (level, low) -> CPU 13 (0x0d00) vector 64
ACPI: PCI Interrupt 0000:00:02.1[B] -> GSI 18 (level, low) -> IRQ 64
ohci_hcd 0000:00:02.1: OHCI Host Controller
ohci_hcd 0000:00:02.1: new USB bus registered, assigned bus number 3
ohci_hcd 0000:00:02.1: irq 64, io mem 0x88031000
usb usb3: configuration #1 chosen from 1 choice
hub 3-0:1.0: USB hub found
hub 3-0:1.0: 2 ports detected
USB Universal Host Controller Interface driver v3.0
Initializing USB Mass Storage driver...
usb 2-1: new full speed USB device using ohci_hcd and address 2
usb 2-1: configuration #1 chosen from 1 choice
usbcore: registered new interface driver usb-storage
USB Mass Storage support registered.
mice: PS/2 mouse device common for all mice
md: raid0 personality registered for level 0
device-mapper: ioctl: 4.12.0-ioctl (2007-10-02) initialised:
dm-devel at redhat.com
ib_mthca: Mellanox InfiniBand HCA driver v0.08 (February 14, 2006)
ib_mthca: Initializing 0000:04:00.0
ACPI: PCI Interrupt 0000:04:00.0[A] -> GSI 49 (level, low) -> IRQ 50
ib_mthca: Initializing 0000:06:00.0
ACPI: PCI Interrupt 0000:06:00.0[A] -> GSI 56 (level, low) -> IRQ 52
ib_mthca: Initializing 0000:08:00.0
ACPI: PCI Interrupt 0000:08:00.0[A] -> GSI 63 (level, low) -> IRQ 54
iscsi: registered transport (iser)
EFI Variables Facility v0.08 2004-May-17
input: HP Virtual Management Device as
/devices/pci0000:00/0000:00:02.0/usb2/2-1/2-1:1.0/input/input0
input: USB HID v1.11 Keyboard [HP Virtual Management Device] on
usb-0000:00:02.0-1
input: HP Virtual Management Device as
/devices/pci0000:00/0000:00:02.0/usb2/2-1/2-1:1.1/input/input1
input: USB HID v1.01 Mouse [HP Virtual Management Device] on
usb-0000:00:02.0-1
usbcore: registered new interface driver usbhid
drivers/hid/usbhid/hid-core.c: v2.6:USB HID core driver
TCP cubic registered
NET: Registered protocol family 1
NET: Registered protocol family 17
md: Autodetecting RAID arrays.
md: Scanned 0 and added 0 devices.
md: autorun ...
md: ... autorun DONE.
RAMDISK: Compressed image found at block 0
EXT2-fs warning: mounting unchecked fs, running e2fsck is recommended
VFS: Mounted root (ext2 filesystem).
Freeing unused kernel memory: 1744kB freed
Intel(R) Gigabit Ethernet Network Driver - version 1.2.44.9
Copyright (c) 2007-2008 Intel Corporation.
user_mad: process infiniband did not enable P_Key index support.
user_mad:   Documentation/infiniband/user_mad.txt has info on the new ABI.
user_mad: process infiniband did not enable P_Key index support.
user_mad:   Documentation/infiniband/user_mad.txt has info on the new ABI.
user_mad: process infiniband did not enable P_Key index support.
user_mad:   Documentation/infiniband/user_mad.txt has info on the new ABI.
user_mad: process infiniband did not enable P_Key index support.
user_mad:   Documentation/infiniband/user_mad.txt has info on the new ABI.
ib_mthca 0000:04:00.0: Catastrophic error detected: internal error
ib_mthca 0000:04:00.0:   buf[00]: 0012f6f8
ib_mthca 0000:04:00.0:   buf[01]: 00000000
ib_mthca 0000:04:00.0:   buf[02]: 00000000
ib_mthca 0000:04:00.0:   buf[03]: 00000000
ib_mthca 0000:04:00.0:   buf[04]: 00000000
ib_mthca 0000:04:00.0:   buf[05]: 0012f6dc
ib_mthca 0000:04:00.0:   buf[06]: 001868f4
ib_mthca 0000:04:00.0:   buf[07]: 00000000
ib_mthca 0000:04:00.0:   buf[08]: 00000000
ib_mthca 0000:04:00.0:   buf[09]: 00000000
ib_mthca 0000:04:00.0:   buf[0a]: 00000000
ib_mthca 0000:04:00.0:   buf[0b]: 00000000
ib_mthca 0000:04:00.0:   buf[0c]: 00000000
ib_mthca 0000:04:00.0:   buf[0d]: 00000000
ib_mthca 0000:04:00.0:   buf[0e]: 00000000
ib_mthca 0000:04:00.0:   buf[0f]: 00000000
ACPI: PCI interrupt for device 0000:04:00.0 disabled
ib_mthca: Initializing 0000:04:00.0
ACPI: PCI Interrupt 0000:04:00.0[A] -> GSI 49 (level, low) -> IRQ 50
user_mad: process infiniband did not enable P_Key index support.
user_mad:   Documentation/infiniband/user_mad.txt has info on the new ABI.
ib_mthca 0000:06:00.0: Catastrophic error detected: internal error
ib_mthca 0000:06:00.0:   buf[00]: 0012f6f8
ib_mthca 0000:06:00.0:   buf[01]: 00000000
ib_mthca 0000:06:00.0:   buf[02]: 00000000
ib_mthca 0000:06:00.0:   buf[03]: 00000000
ib_mthca 0000:06:00.0:   buf[04]: 00000000
ib_mthca 0000:06:00.0:   buf[05]: 0012f6dc
ib_mthca 0000:06:00.0:   buf[06]: 001b3658
ib_mthca 0000:06:00.0:   buf[07]: 00000000
ib_mthca 0000:06:00.0:   buf[08]: 00000000
ib_mthca 0000:06:00.0:   buf[09]: 00000000
ib_mthca 0000:06:00.0:   buf[0a]: 00000000
ib_mthca 0000:06:00.0:   buf[0b]: 00000000
ib_mthca 0000:06:00.0:   buf[0c]: 00000000
ib_mthca 0000:06:00.0:   buf[0d]: 00000000
ib_mthca 0000:06:00.0:   buf[0e]: 00000000
ib_mthca 0000:06:00.0:   buf[0f]: 00000000
ACPI: PCI interrupt for device 0000:06:00.0 disabled
ib_mthca: Initializing 0000:06:00.0
ACPI: PCI Interrupt 0000:06:00.0[A] -> GSI 56 (level, low) -> IRQ 52
user_mad: process infiniband did not enable P_Key index support.
user_mad:   Documentation/infiniband/user_mad.txt has info on the new ABI.
#>

#> uname -a
Linux (none) 2.6.24.02.02.08 #22 SMP Thu Feb 26 13:39:02 PST 2009 ia64
unknown

#> lspci -v -d 15b3:
0000:04:00.0 InfiniBand: Mellanox Technologies MT25204 [InfiniHost III Lx
HCA] (rev 20)
        Subsystem: Hewlett-Packard Company: Unknown device 170a
        Flags: bus master, fast devsel, latency 0, IRQ 50
        Memory at 00000000b0100000 (64-bit, non-prefetchable) [size=1M]
        Memory at 0000080380000000 (64-bit, prefetchable) [size=8M]
        Expansion ROM at 00000000b0000000 [disabled] [size=1M]
        Capabilities: [40] Power Management version 2
        Capabilities: [48] Vital Product Data
        Capabilities: [90] Message Signalled Interrupts: 64bit+ Queue=0/5
Enable-
        Capabilities: [84] #11 [801f]
        Capabilities: [60] #10 [0001]

0000:06:00.0 InfiniBand: Mellanox Technologies MT25204 [InfiniHost III Lx
HCA] (rev 20)
        Subsystem: Hewlett-Packard Company: Unknown device 170a
        Flags: bus master, fast devsel, latency 0, IRQ 52
        Memory at 00000000c0100000 (64-bit, non-prefetchable) [size=1M]
        Memory at 0000080480000000 (64-bit, prefetchable) [size=8M]
        Expansion ROM at 00000000c0000000 [disabled] [size=1M]
        Capabilities: [40] Power Management version 2
        Capabilities: [48] Vital Product Data
        Capabilities: [90] Message Signalled Interrupts: 64bit+ Queue=0/5
Enable-
        Capabilities: [84] #11 [801f]
        Capabilities: [60] #10 [0001]

0000:08:00.0 InfiniBand: Mellanox Technologies MT25204 [InfiniHost III Lx
HCA] (rev 20)
        Subsystem: Hewlett-Packard Company: Unknown device 170a
        Flags: bus master, fast devsel, latency 0, IRQ 54
        Memory at 00000000f0100000 (64-bit, non-prefetchable) [size=1M]
        Memory at 0000080780000000 (64-bit, prefetchable) [size=8M]
        Expansion ROM at 00000000f0000000 [disabled] [size=1M]
        Capabilities: [40] Power Management version 2
        Capabilities: [48] Vital Product Data
        Capabilities: [90] Message Signalled Interrupts: 64bit+ Queue=0/5
Enable-
        Capabilities: [84] #11 [801f]
        Capabilities: [60] #10 [0001]

#> cat /proc/cpuinfo
processor  : 0
vendor     : GenuineIntel
arch       : IA-64
family     : 32
model      : 1
model name : Dual-Core Intel(R) Itanium(R) Processor 9140N
revision   : 1
archrev    : 0
features   : branchlong, 16-byte atomic ops
cpu number : 0
cpu regs   : 4
cpu MHz    : 1594.670
itc MHz    : 399.165948
BogoMIPS   : 3186.68
siblings   : 4
physical id: 0
core id    : 0
thread id  : 0

processor  : 1
vendor     : GenuineIntel
arch       : IA-64
family     : 32
model      : 1
model name : Dual-Core Intel(R) Itanium(R) Processor 9140N
revision   : 1
archrev    : 0
features   : branchlong, 16-byte atomic ops
cpu number : 0
cpu regs   : 4
cpu MHz    : 1594.670
itc MHz    : 399.165948
BogoMIPS   : 3186.68
siblings   : 4
physical id: 0
core id    : 0
thread id  : 1

processor  : 2
vendor     : GenuineIntel
arch       : IA-64
family     : 32
model      : 1
model name : Dual-Core Intel(R) Itanium(R) Processor 9140N
revision   : 1
archrev    : 0
features   : branchlong, 16-byte atomic ops
cpu number : 0
cpu regs   : 4
cpu MHz    : 1594.670
itc MHz    : 399.165948
BogoMIPS   : 3186.68
siblings   : 4
physical id: 0
core id    : 1
thread id  : 0

processor  : 3
vendor     : GenuineIntel
arch       : IA-64
family     : 32
model      : 1
model name : Dual-Core Intel(R) Itanium(R) Processor 9140N
revision   : 1
archrev    : 0
features   : branchlong, 16-byte atomic ops
cpu number : 0
cpu regs   : 4
cpu MHz    : 1594.670
itc MHz    : 399.165948
BogoMIPS   : 3186.68
siblings   : 4
physical id: 0
core id    : 1
thread id  : 1

processor  : 4
vendor     : GenuineIntel
arch       : IA-64
family     : 32
model      : 1
model name : Dual-Core Intel(R) Itanium(R) Processor 9140N
revision   : 1
archrev    : 0
features   : branchlong, 16-byte atomic ops
cpu number : 0
cpu regs   : 4
cpu MHz    : 1594.670
itc MHz    : 399.165948
BogoMIPS   : 3186.68
siblings   : 4
physical id: 1
core id    : 0
thread id  : 0

processor  : 5
vendor     : GenuineIntel
arch       : IA-64
family     : 32
model      : 1
model name : Dual-Core Intel(R) Itanium(R) Processor 9140N
revision   : 1
archrev    : 0
features   : branchlong, 16-byte atomic ops
cpu number : 0
cpu regs   : 4
cpu MHz    : 1594.670
itc MHz    : 399.165948
BogoMIPS   : 3186.68
siblings   : 4
physical id: 1
core id    : 0
thread id  : 1

processor  : 6
vendor     : GenuineIntel
arch       : IA-64
family     : 32
model      : 1
model name : Dual-Core Intel(R) Itanium(R) Processor 9140N
revision   : 1
archrev    : 0
features   : branchlong, 16-byte atomic ops
cpu number : 0
cpu regs   : 4
cpu MHz    : 1594.670
itc MHz    : 399.165948
BogoMIPS   : 3186.68
siblings   : 4
physical id: 1
core id    : 1
thread id  : 0

processor  : 7
vendor     : GenuineIntel
arch       : IA-64
family     : 32
model      : 1
model name : Dual-Core Intel(R) Itanium(R) Processor 9140N
revision   : 1
archrev    : 0
features   : branchlong, 16-byte atomic ops
cpu number : 0
cpu regs   : 4
cpu MHz    : 1594.670
itc MHz    : 399.165948
BogoMIPS   : 3186.68
siblings   : 4
physical id: 1
core id    : 1
thread id  : 1

processor  : 8
vendor     : GenuineIntel
arch       : IA-64
family     : 32
model      : 1
model name : Dual-Core Intel(R) Itanium(R) Processor 9140N
revision   : 1
archrev    : 0
features   : branchlong, 16-byte atomic ops
cpu number : 0
cpu regs   : 4
cpu MHz    : 1594.670
itc MHz    : 399.165948
BogoMIPS   : 3186.68
siblings   : 4
physical id: 2
core id    : 0
thread id  : 0

processor  : 9
vendor     : GenuineIntel
arch       : IA-64
family     : 32
model      : 1
model name : Dual-Core Intel(R) Itanium(R) Processor 9140N
revision   : 1
archrev    : 0
features   : branchlong, 16-byte atomic ops
cpu number : 0
cpu regs   : 4
cpu MHz    : 1594.670
itc MHz    : 399.165948
BogoMIPS   : 3186.68
siblings   : 4
physical id: 2
core id    : 0
thread id  : 1

processor  : 10
vendor     : GenuineIntel
arch       : IA-64
family     : 32
model      : 1
model name : Dual-Core Intel(R) Itanium(R) Processor 9140N
revision   : 1
archrev    : 0
features   : branchlong, 16-byte atomic ops
cpu number : 0
cpu regs   : 4
cpu MHz    : 1594.670
itc MHz    : 399.165948
BogoMIPS   : 3186.68
siblings   : 4
physical id: 2
core id    : 1
thread id  : 0

processor  : 11
vendor     : GenuineIntel
arch       : IA-64
family     : 32
model      : 1
model name : Dual-Core Intel(R) Itanium(R) Processor 9140N
revision   : 1
archrev    : 0
features   : branchlong, 16-byte atomic ops
cpu number : 0
cpu regs   : 4
cpu MHz    : 1594.670
itc MHz    : 399.165948
BogoMIPS   : 3186.68
siblings   : 4
physical id: 2
core id    : 1
thread id  : 1

processor  : 12
vendor     : GenuineIntel
arch       : IA-64
family     : 32
model      : 1
model name : Dual-Core Intel(R) Itanium(R) Processor 9140N
revision   : 1
archrev    : 0
features   : branchlong, 16-byte atomic ops
cpu number : 0
cpu regs   : 4
cpu MHz    : 1594.670
itc MHz    : 399.165948
BogoMIPS   : 3186.68
siblings   : 4
physical id: 3
core id    : 0
thread id  : 0

processor  : 13
vendor     : GenuineIntel
arch       : IA-64
family     : 32
model      : 1
model name : Dual-Core Intel(R) Itanium(R) Processor 9140N
revision   : 1
archrev    : 0
features   : branchlong, 16-byte atomic ops
cpu number : 0
cpu regs   : 4
cpu MHz    : 1594.670
itc MHz    : 399.165948
BogoMIPS   : 3186.68
siblings   : 4
physical id: 3
core id    : 0
thread id  : 1

processor  : 14
vendor     : GenuineIntel
arch       : IA-64
family     : 32
model      : 1
model name : Dual-Core Intel(R) Itanium(R) Processor 9140N
revision   : 1
archrev    : 0
features   : branchlong, 16-byte atomic ops
cpu number : 0
cpu regs   : 4
cpu MHz    : 1594.670
itc MHz    : 399.165948
BogoMIPS   : 3186.68
siblings   : 4
physical id: 3
core id    : 1
thread id  : 0

processor  : 15
vendor     : GenuineIntel
arch       : IA-64
family     : 32
model      : 1
model name : Dual-Core Intel(R) Itanium(R) Processor 9140N
revision   : 1
archrev    : 0
features   : branchlong, 16-byte atomic ops
cpu number : 0
cpu regs   : 4
cpu MHz    : 1594.670
itc MHz    : 399.165948
BogoMIPS   : 3186.68
siblings   : 4
physical id: 3
core id    : 1
thread id  : 1

#> cat /proc/meminfo
MemTotal:     37277888 kB
MemFree:      37036048 kB
Buffers:         98304 kB
Cached:          10320 kB
SwapCached:          0 kB
Active:           8128 kB
Inactive:       101568 kB
SwapTotal:           0 kB
SwapFree:            0 kB
Dirty:               0 kB
Writeback:           0 kB
AnonPages:        1248 kB
Mapped:           1792 kB
Slab:            28400 kB
SReclaimable:     1248 kB
SUnreclaim:      27152 kB
PageTables:        320 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:  18638944 kB
Committed_AS:        0 kB
VmallocTotal: 137426880512 kB
VmallocUsed:       432 kB
VmallocChunk: 137426879584 kB
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20090311/e9af01f1/attachment.html>


More information about the general mailing list