[ofa-general] ib_mthca catastrophic error detected
Scott A. Friedman
friedman at ucla.edu
Tue Oct 28 12:11:12 PDT 2008
Hi
This cluster has OFED 1.2.5.4 running on it. The ib_mthca kernel module
reports the following on startup:
ib_mthca: Mellanox InfiniBand HCA driver v1.0 (February 28, 2008)
The cards in all (22) of the nodes we have seen this error on are as
follows:
hca_id: mthca0
fw_ver: 1.2.0
vendor_id: 0x02c9
vendor_part_id: 25204
hw_ver: 0xA0
board_id: MT_03B0140001
phys_port_cnt: 1
It appears that when this happens the driver restarts (loads?) itself
however the job running at the time of the error is, of course, killed.
Scott
Tziporet Koren wrote:
>>
>> ib_mthca 0000:02:00.0: Catastrophic error detected: internal error
>>
> Can you specify:
> Which OFED version you use? (or IB from kernel.org)
> Which HCA and FW version?
>
> Tziporet
>
>
>
More information about the general
mailing list