[ofa-general] ib_mthca Catastrophic errors
Tziporet Koren
tziporet at dev.mellanox.co.il
Sun Jun 7 03:50:18 PDT 2009
Pawel Dziekonski wrote:
> Hi,
>
> from time to time I get Catastrophic errors like below. software stack is
> kernel 2.6.18-92.1.10.el5 with Lustre client. device and OFED info is also
> below.
>
> any hints?
>
> thanks in advance, Pawel
>
>
>
> 06:00.0 InfiniBand: Mellanox Technologies MT25204 [InfiniHost III Lx HCA] (rev 20)
>
> # ibv_devices
> device node GUID
> ------ ----------------
> mthca0 0030487e07700000
> # ibv_devinfo
> hca_id: mthca0
> fw_ver: 1.2.0
> node_guid: 0030:487e:0770:0000
> sys_image_guid: 0030:487e:0770:0003
> vendor_id: 0x02c9
> vendor_part_id: 25204
> hw_ver: 0xA0
> board_id: SM_0000000003
> phys_port_cnt: 1
> port: 1
> state: PORT_ACTIVE (4)
> max_mtu: 2048 (4)
> active_mtu: 2048 (4)
> sm_lid: 1
> port_lid: 441
> port_lmc: 0x00
>
>
>
>
>
> kernel: ib_mthca 0000:06:00.0: Catastrophic error detected: unknown error
> kernel: ib_mthca 0000:06:00.0: buf[00]: ffffffff
> kernel: ib_mthca 0000:06:00.0: buf[01]: ffffffff
> kernel: ib_mthca 0000:06:00.0: buf[02]: ffffffff
> kernel: ib_mthca 0000:06:00.0: buf[03]: ffffffff
> kernel: ib_mthca 0000:06:00.0: buf[04]: ffffffff
> kernel: ib_mthca 0000:06:00.0: buf[05]: ffffffff
> kernel: ib_mthca 0000:06:00.0: buf[06]: ffffffff
> kernel: ib_mthca 0000:06:00.0: buf[07]: ffffffff
> kernel: ib_mthca 0000:06:00.0: buf[08]: ffffffff
> kernel: ib_mthca 0000:06:00.0: buf[09]: ffffffff
> kernel: ib_mthca 0000:06:00.0: buf[0a]: ffffffff
> kernel: ib_mthca 0000:06:00.0: buf[0b]: ffffffff
> kernel: ib_mthca 0000:06:00.0: buf[0c]: ffffffff
> kernel: ib_mthca 0000:06:00.0: buf[0d]: ffffffff
> kernel: ib_mthca 0000:06:00.0: buf[0e]: ffffffff
> kernel: ib_mthca 0000:06:00.0: buf[0f]: ffffffff
> kernel: ib_mthca 0000:06:00.0: HW2SW_MPT failed (-11)
> kernel: ib_mthca 0000:06:00.0: HW2SW_MPT failed (-11)
> kernel: ib0: ib_detach_mcast failed (result = -11)
> kernel: ib0: ipoib_mcast_detach failed (result = -11)
> kernel: ib0: ib_detach_mcast failed (result = -11)
> kernel: ib0: ipoib_mcast_detach failed (result = -11)
> kernel: ib0: Failed to modify QP to ERROR state
> kernel: ib0: timing out; 0 sends 128 receives not completed
> kernel: ib0: Failed to modify QP to RESET state
> kernel: ib_mthca 0000:06:00.0: HW2SW_MPT failed (-11)
> kernel: ib_mthca 0000:06:00.0: HW2SW_CQ failed (-11)
> kernel: ib_mthca 0000:06:00.0: HW2SW_MPT failed (-11)
> kernel: ib_mthca 0000:06:00.0: HW2SW_SRQ failed (-11)
> kernel: ib_mthca 0000:06:00.0: HW2SW_MPT failed (-11)
>
>
> kernel: ib_mthca 0000:01:00.0: Catastrophic error detected: internal parity error
> kernel: ib_mthca 0000:01:00.0: buf[00]: 05000000
> kernel: ib_mthca 0000:01:00.0: buf[01]: 00000000
> kernel: ib_mthca 0000:01:00.0: buf[02]: 00000000
> kernel: ib_mthca 0000:01:00.0: buf[03]: 00000000
> kernel: ib_mthca 0000:01:00.0: buf[04]: 00000000
> kernel: ib_mthca 0000:01:00.0: buf[05]: 00127f2c
> kernel: ib_mthca 0000:01:00.0: buf[06]: 000a0056
> kernel: ib_mthca 0000:01:00.0: buf[07]: 00000000
> kernel: ib_mthca 0000:01:00.0: buf[08]: 00000000
> kernel: ib_mthca 0000:01:00.0: buf[09]: 00000000
> kernel: ib_mthca 0000:01:00.0: buf[0a]: 00000000
> kernel: ib_mthca 0000:01:00.0: buf[0b]: 00000000
> kernel: ib_mthca 0000:01:00.0: buf[0c]: 00000000
> kernel: ib_mthca 0000:01:00.0: buf[0d]: 00000000
> kernel: ib_mthca 0000:01:00.0: buf[0e]: 00000000
> kernel: ib_mthca 0000:01:00.0: buf[0f]: 00000000
> kernel: ib0: ib_query_port failed
>
>
This is a known issue with Infinihost III HCA FW 1.2.0
Please contact Mellanox support to get an updated version for the FW
Tziporet
More information about the general
mailing list