[ofa-general] ib_mthca catastrophic error detected

Scott A. Friedman friedman at ucla.edu
Tue Oct 28 12:11:12 PDT 2008


Hi

This cluster has OFED 1.2.5.4 running on it. The ib_mthca kernel module 
reports the following on startup:

ib_mthca: Mellanox InfiniBand HCA driver v1.0 (February 28, 2008)

The cards in all (22) of the nodes we have seen this error on are as 
follows:

hca_id: mthca0
         fw_ver:                         1.2.0
         vendor_id:                      0x02c9
         vendor_part_id:                 25204
         hw_ver:                         0xA0
         board_id:                       MT_03B0140001
         phys_port_cnt:                  1

It appears that when this happens the driver restarts (loads?) itself 
however the job running at the time of the error is, of course, killed.

Scott

Tziporet Koren wrote:
>>
>> ib_mthca 0000:02:00.0: Catastrophic error detected: internal error
>>
> Can you specify:
> Which OFED version you use? (or IB from kernel.org)
> Which HCA and FW version?
> 
> Tziporet
> 
> 
> 



More information about the general mailing list