[ofa-general] ib_mthca catastrophic error detected
    Scott A. Friedman 
    friedman at ucla.edu
       
    Tue Oct 28 12:11:12 PDT 2008
    
    
  
Hi
This cluster has OFED 1.2.5.4 running on it. The ib_mthca kernel module 
reports the following on startup:
ib_mthca: Mellanox InfiniBand HCA driver v1.0 (February 28, 2008)
The cards in all (22) of the nodes we have seen this error on are as 
follows:
hca_id: mthca0
         fw_ver:                         1.2.0
         vendor_id:                      0x02c9
         vendor_part_id:                 25204
         hw_ver:                         0xA0
         board_id:                       MT_03B0140001
         phys_port_cnt:                  1
It appears that when this happens the driver restarts (loads?) itself 
however the job running at the time of the error is, of course, killed.
Scott
Tziporet Koren wrote:
>>
>> ib_mthca 0000:02:00.0: Catastrophic error detected: internal error
>>
> Can you specify:
> Which OFED version you use? (or IB from kernel.org)
> Which HCA and FW version?
> 
> Tziporet
> 
> 
> 
    
    
More information about the general
mailing list