[ofa-general] Re: Fw: mthca issues -need help

Michael S. Tsirkin mst at dev.mellanox.co.il
Sat Apr 14 10:34:35 PDT 2007


As a start, how about upgrading to a recent FW?

Quoting Pradeep Satyanarayana <pradeep at us.ibm.com>:
Subject: Fw: mthca issues -need help

Micheal,

Will you be able to help me with some of the issues listed below?

Pradeep
pradeep at us.ibm.com
----- Forwarded by Pradeep Satyanarayana/Beaverton/IBM on 04/13/2007 08:33 
AM -----

Pradeep Satyanarayana/Beaverton/IBM 
04/12/2007 01:58 PM

To
general at lists.openfabrics.org
cc
"Michael S. Tsirkin" <mst at dev.mellanox.co.il>
Subject
mthca issues -need help





I am running into a number of mthca issues listed below and need help with 
them.


1. I am using linux-2.6.21-rc5 and I see this Oops when I modprobe 
ib_mthca (on ppc64)

Apr 12 14:11:19 elm3b37 kernel: ib_mthca 0002:d9:00.0: HCA FW version 
3.3.3 is old (3.4.0 is current).
Apr 12 14:11:19 elm3b37 kernel: ib_mthca 0002:d9:00.0: If you have 
problems, try updating your HCA FW.
Apr 12 14:11:19 elm3b37 kernel: Faulting instruction address: 
0xd0000000002db0d8
Apr 12 14:11:19 elm3b37 kernel: Oops: Kernel access of bad area, sig: 11 
[#2]
Apr 12 14:11:19 elm3b37 kernel: SMP NR_CPUS=128 NUMA
Apr 12 14:11:19 elm3b37 kernel: Modules linked in: ib_mthca ib_mad ib_ehca 
ib_core autofs4 ipv6 binfmt_misc parport_pc lp parport sg e1000 
dm_snapshot dm_zero dm_mirror dm_mod ipr libata sd_mod scsi_mod 
firmware_class ehci_hcd ohci_hcd usbcore
Apr 12 14:11:19 elm3b37 kernel: NIP: D0000000002DB0D8 LR: D0000000002DAE0C 
CTR: 0000000000000400
Apr 12 14:11:19 elm3b37 kernel: REGS: c0000000e2116f60 TRAP: 0300   Not 
tainted  (2.6.21-rc5)
Apr 12 14:11:19 elm3b37 kernel: MSR: 8000000000009032 <EE,ME,IR,DR>  CR: 
24024444  XER: 00000008
Apr 12 14:11:19 elm3b37 kernel: DAR: 0000000000002000, DSISR: 
0000000042000000
Apr 12 14:11:19 elm3b37 kernel: TASK = c0000000e7de4040[3884] 'modprobe' 
THREAD: c0000000e2114000 CPU: 0
Apr 12 14:11:19 elm3b37 kernel: GPR00: 0000000040010001 C0000000E21171E0 
D000000000308B30 0000000007FFFFFF
Apr 12 14:11:19 elm3b37 kernel: GPR04: C0000000E595FE00 0000000000000000 
C0000000E2438000 0000000000000400
Apr 12 14:11:19 elm3b37 kernel: GPR08: 0000000000000000 0000000000000400 
0000000000002000 0000000000000000
Apr 12 14:11:19 elm3b37 kernel: GPR12: D0000000002EAD28 C000000000535A80 
AAAAAAAAAAAAAAAB D0000000005A0C10
Apr 12 14:11:19 elm3b37 kernel: GPR16: 0000000000000000 0000000000000312 
0000000000000312 000000000000003F
Apr 12 14:11:19 elm3b37 kernel: GPR20: C0000000E595FE20 C0000000E4F04000 
C0000000E595FE00 0000000000000000
Apr 12 14:11:19 elm3b37 kernel: GPR24: C0000000E4FAF000 0000000007FFFFFF 
0000000000000000 0000000000002000
Apr 12 14:11:19 elm3b37 kernel: GPR28: C0000000E2438000 0000000000000400 
D0000000003075B0 0000000000000400
Apr 12 14:11:19 elm3b37 kernel: NIP [D0000000002DB0D8] 
.mthca_write_mtt+0x328/0x460 [ib_mthca]
Apr 12 14:11:19 elm3b37 kernel: LR [D0000000002DAE0C] 
.mthca_write_mtt+0x5c/0x460 [ib_mthca]
Apr 12 14:11:19 elm3b37 kernel: Call Trace:
Apr 12 14:11:19 elm3b37 kernel: [C0000000E21171E0] [C0000000E2117300] 
0xc0000000e2117300 (unreliable)
Apr 12 14:11:19 elm3b37 kernel: [C0000000E21172D0] [D0000000002DBD1C] 
.mthca_mr_alloc_phys+0x8c/0x140 [ib_mthca]
Apr 12 14:11:19 elm3b37 kernel: [C0000000E2117390] [D0000000002D6B6C] 
.mthca_create_eq+0x3ac/0x5e0 [ib_mthca]
Apr 12 14:11:19 elm3b37 kernel: [C0000000E2117490] [D0000000002D7528] 
.mthca_init_eq_table+0x198/0x790 [ib_mthca]
Apr 12 14:11:19 elm3b37 kernel: [C0000000E2117560] [D0000000002D0368] 
.__mthca_init_one+0xa38/0xd70 [ib_mthca]
Apr 12 14:11:19 elm3b37 kernel: [C0000000E2117640] [D0000000002D0714] 
.mthca_init_one+0x74/0xf0 [ib_mthca]
Apr 12 14:11:19 elm3b37 kernel: [C0000000E21176E0] [C0000000002487D8] 
.pci_device_probe+0x168/0x200
Apr 12 14:11:19 elm3b37 kernel: [C0000000E21177A0] [C0000000002C288C] 
.really_probe+0xbc/0x1f0
Apr 12 14:11:19 elm3b37 kernel: [C0000000E2117850] [C0000000002C2D3C] 
.__driver_attach+0xfc/0x140
Apr 12 14:11:19 elm3b37 kernel: [C0000000E21178E0] [C0000000002C1668] 
.bus_for_each_dev+0x88/0xe0
Apr 12 14:11:19 elm3b37 kernel: [C0000000E21179A0] [C0000000002C2628] 
.driver_attach+0x28/0x40
Apr 12 14:11:19 elm3b37 kernel: [C0000000E2117A20] [C0000000002C1C34] 
.bus_add_driver+0xc4/0x220
Apr 12 14:11:19 elm3b37 kernel: [C0000000E2117AC0] [C0000000002C3118] 
.driver_register+0x78/0xe0
Apr 12 14:11:19 elm3b37 kernel: [C0000000E2117B40] [C000000000248B70] 
.__pci_register_driver+0x90/0x120
Apr 12 14:11:19 elm3b37 kernel: [C0000000E2117BE0] [D0000000002EA050] 
.mthca_init+0x100/0x170 [ib_mthca]
Apr 12 14:11:19 elm3b37 kernel: [C0000000E2117C70] [C0000000000848FC] 
.sys_init_module+0x20c/0x1990
Apr 12 14:11:19 elm3b37 kernel: [C0000000E2117E30] [C00000000000862C] 
syscall_exit+0x0/0x40
Apr 12 14:11:19 elm3b37 kernel: Instruction dump:
Apr 12 14:11:19 elm3b37 kernel: 7d290214 7d495a14 409d0038 393fffff 
39600000 79290020 39290001 7d2903a6
Apr 12 14:11:19 elm3b37 kernel: 60000000 60000000 7c1c582a 60000001 
<7c0a592a> 396b0008 4200fff0 7bfb1f24

2. The above may or may not be a bug and as indicated in the message I 
wanted to upgrade (the FW). However, I found that the
latest firmware is 3.5.0 and not 3.4.0 as the message seems to indicate. I 
wanted to use IPOIB CM -so which one should I upgrade to -
presumably 3.5.0?

3. From the following url 

http://www.mellanox.com/support/firmware_table_IH.php 

it is not clear to me as to which firmware I should download.

lspci -v shows me :

0002:d9:00.0 InfiniBand: Mellanox Technologies MT23108 InfiniHost (rev a1)
        Subsystem: Mellanox Technologies MT23108 InfiniHost


So, I was planning on using fw-23108-3_5_000-MHET2X-1SC_A1.bin.zip  -Is 
that correct?

3. When I downloaded mft-1.0.1.tar I found that ppc64 is not supported.

4. I moved my HCA to x86_64 and then tried to install mft utilities. There 
was a previous version of the tool and I asked
to uinstall it. After that I see the following:

/home/tools/mft-1.0.1 # ./install.sh

  *** Mellanox Firmware Tools (MFT) Package Installation ***
      MFT Build 20060118-1817

  Copyright (C) June  2002, Mellanox Technologies  Ltd.
  ALL  RIGHTS  RESERVED.   Use of  software subject to the
  terms and conditions detailed in the file "LICENSE.txt".

  Found a previous installation of the MFT package.
  Current installed MFT Build ID is 20060118-1817
  This installation MFT Build ID is 20060118-1817

  Remove currently installed components (run 
/usr/mellanox/mft/uninstall.sh) ?  :(y/n) [n] y
  Running /usr/mellanox/mft/uninstall.sh ...
  Uninstall completed successfully.


  This installation installs the MFT components into /usr
  Installing MST package under /usr/mst ...
MFT Depends on pre-installed MST. Fail to find /usr/mst/lib/libmtcr.a

Nowhere could I find the libmtcr.a? 

I need help with above listed issues. Thanks!

Pradeep
pradeep at us.ibm.com

-- 
MST



More information about the general mailing list