[ofa-general] ib_mthca Catastrophic errors

Tziporet Koren tziporet at dev.mellanox.co.il
Sun Jun 7 03:50:18 PDT 2009


Pawel Dziekonski wrote:
> Hi,
>
> from time to time I get Catastrophic errors like below. software stack is
> kernel 2.6.18-92.1.10.el5 with Lustre client. device and OFED info is also
> below.
>
> any hints?
>
> thanks in advance, Pawel
>
>
>
> 06:00.0 InfiniBand: Mellanox Technologies MT25204 [InfiniHost III Lx HCA] (rev 20)
>
> # ibv_devices
>     device                 node GUID
>     ------              ----------------
>     mthca0              0030487e07700000
> # ibv_devinfo
> hca_id: mthca0
>         fw_ver:                         1.2.0
>         node_guid:                      0030:487e:0770:0000
>         sys_image_guid:                 0030:487e:0770:0003
>         vendor_id:                      0x02c9
>         vendor_part_id:                 25204
>         hw_ver:                         0xA0
>         board_id:                       SM_0000000003
>         phys_port_cnt:                  1
>                 port:   1
>                         state:                  PORT_ACTIVE (4)
>                         max_mtu:                2048 (4)
>                         active_mtu:             2048 (4)
>                         sm_lid:                 1
>                         port_lid:               441
>                         port_lmc:               0x00
>
>
>
>
>
> kernel: ib_mthca 0000:06:00.0: Catastrophic error detected: unknown error
> kernel: ib_mthca 0000:06:00.0:   buf[00]: ffffffff
> kernel: ib_mthca 0000:06:00.0:   buf[01]: ffffffff
> kernel: ib_mthca 0000:06:00.0:   buf[02]: ffffffff
> kernel: ib_mthca 0000:06:00.0:   buf[03]: ffffffff
> kernel: ib_mthca 0000:06:00.0:   buf[04]: ffffffff
> kernel: ib_mthca 0000:06:00.0:   buf[05]: ffffffff
> kernel: ib_mthca 0000:06:00.0:   buf[06]: ffffffff
> kernel: ib_mthca 0000:06:00.0:   buf[07]: ffffffff
> kernel: ib_mthca 0000:06:00.0:   buf[08]: ffffffff
> kernel: ib_mthca 0000:06:00.0:   buf[09]: ffffffff
> kernel: ib_mthca 0000:06:00.0:   buf[0a]: ffffffff
> kernel: ib_mthca 0000:06:00.0:   buf[0b]: ffffffff
> kernel: ib_mthca 0000:06:00.0:   buf[0c]: ffffffff
> kernel: ib_mthca 0000:06:00.0:   buf[0d]: ffffffff
> kernel: ib_mthca 0000:06:00.0:   buf[0e]: ffffffff
> kernel: ib_mthca 0000:06:00.0:   buf[0f]: ffffffff
> kernel: ib_mthca 0000:06:00.0: HW2SW_MPT failed (-11)
> kernel: ib_mthca 0000:06:00.0: HW2SW_MPT failed (-11)
> kernel: ib0: ib_detach_mcast failed (result = -11)
> kernel: ib0: ipoib_mcast_detach failed (result = -11)
> kernel: ib0: ib_detach_mcast failed (result = -11)
> kernel: ib0: ipoib_mcast_detach failed (result = -11)
> kernel: ib0: Failed to modify QP to ERROR state
> kernel: ib0: timing out; 0 sends 128 receives not completed
> kernel: ib0: Failed to modify QP to RESET state
> kernel: ib_mthca 0000:06:00.0: HW2SW_MPT failed (-11)
> kernel: ib_mthca 0000:06:00.0: HW2SW_CQ failed (-11)
> kernel: ib_mthca 0000:06:00.0: HW2SW_MPT failed (-11)
> kernel: ib_mthca 0000:06:00.0: HW2SW_SRQ failed (-11)
> kernel: ib_mthca 0000:06:00.0: HW2SW_MPT failed (-11)
>
>
> kernel: ib_mthca 0000:01:00.0: Catastrophic error detected: internal parity error
> kernel: ib_mthca 0000:01:00.0:   buf[00]: 05000000
> kernel: ib_mthca 0000:01:00.0:   buf[01]: 00000000
> kernel: ib_mthca 0000:01:00.0:   buf[02]: 00000000
> kernel: ib_mthca 0000:01:00.0:   buf[03]: 00000000
> kernel: ib_mthca 0000:01:00.0:   buf[04]: 00000000
> kernel: ib_mthca 0000:01:00.0:   buf[05]: 00127f2c
> kernel: ib_mthca 0000:01:00.0:   buf[06]: 000a0056
> kernel: ib_mthca 0000:01:00.0:   buf[07]: 00000000
> kernel: ib_mthca 0000:01:00.0:   buf[08]: 00000000
> kernel: ib_mthca 0000:01:00.0:   buf[09]: 00000000
> kernel: ib_mthca 0000:01:00.0:   buf[0a]: 00000000
> kernel: ib_mthca 0000:01:00.0:   buf[0b]: 00000000
> kernel: ib_mthca 0000:01:00.0:   buf[0c]: 00000000
> kernel: ib_mthca 0000:01:00.0:   buf[0d]: 00000000
> kernel: ib_mthca 0000:01:00.0:   buf[0e]: 00000000
> kernel: ib_mthca 0000:01:00.0:   buf[0f]: 00000000
> kernel: ib0: ib_query_port failed
>
>   

This is a known issue with Infinihost III HCA FW 1.2.0
Please contact Mellanox support to get an updated version for the FW

Tziporet




More information about the general mailing list