[ofa-general] Re: Fw: mthca issues -need help
Michael S. Tsirkin
mst at dev.mellanox.co.il
Sat Apr 14 10:34:35 PDT 2007
As a start, how about upgrading to a recent FW?
Quoting Pradeep Satyanarayana <pradeep at us.ibm.com>:
Subject: Fw: mthca issues -need help
Micheal,
Will you be able to help me with some of the issues listed below?
Pradeep
pradeep at us.ibm.com
----- Forwarded by Pradeep Satyanarayana/Beaverton/IBM on 04/13/2007 08:33
AM -----
Pradeep Satyanarayana/Beaverton/IBM
04/12/2007 01:58 PM
To
general at lists.openfabrics.org
cc
"Michael S. Tsirkin" <mst at dev.mellanox.co.il>
Subject
mthca issues -need help
I am running into a number of mthca issues listed below and need help with
them.
1. I am using linux-2.6.21-rc5 and I see this Oops when I modprobe
ib_mthca (on ppc64)
Apr 12 14:11:19 elm3b37 kernel: ib_mthca 0002:d9:00.0: HCA FW version
3.3.3 is old (3.4.0 is current).
Apr 12 14:11:19 elm3b37 kernel: ib_mthca 0002:d9:00.0: If you have
problems, try updating your HCA FW.
Apr 12 14:11:19 elm3b37 kernel: Faulting instruction address:
0xd0000000002db0d8
Apr 12 14:11:19 elm3b37 kernel: Oops: Kernel access of bad area, sig: 11
[#2]
Apr 12 14:11:19 elm3b37 kernel: SMP NR_CPUS=128 NUMA
Apr 12 14:11:19 elm3b37 kernel: Modules linked in: ib_mthca ib_mad ib_ehca
ib_core autofs4 ipv6 binfmt_misc parport_pc lp parport sg e1000
dm_snapshot dm_zero dm_mirror dm_mod ipr libata sd_mod scsi_mod
firmware_class ehci_hcd ohci_hcd usbcore
Apr 12 14:11:19 elm3b37 kernel: NIP: D0000000002DB0D8 LR: D0000000002DAE0C
CTR: 0000000000000400
Apr 12 14:11:19 elm3b37 kernel: REGS: c0000000e2116f60 TRAP: 0300 Not
tainted (2.6.21-rc5)
Apr 12 14:11:19 elm3b37 kernel: MSR: 8000000000009032 <EE,ME,IR,DR> CR:
24024444 XER: 00000008
Apr 12 14:11:19 elm3b37 kernel: DAR: 0000000000002000, DSISR:
0000000042000000
Apr 12 14:11:19 elm3b37 kernel: TASK = c0000000e7de4040[3884] 'modprobe'
THREAD: c0000000e2114000 CPU: 0
Apr 12 14:11:19 elm3b37 kernel: GPR00: 0000000040010001 C0000000E21171E0
D000000000308B30 0000000007FFFFFF
Apr 12 14:11:19 elm3b37 kernel: GPR04: C0000000E595FE00 0000000000000000
C0000000E2438000 0000000000000400
Apr 12 14:11:19 elm3b37 kernel: GPR08: 0000000000000000 0000000000000400
0000000000002000 0000000000000000
Apr 12 14:11:19 elm3b37 kernel: GPR12: D0000000002EAD28 C000000000535A80
AAAAAAAAAAAAAAAB D0000000005A0C10
Apr 12 14:11:19 elm3b37 kernel: GPR16: 0000000000000000 0000000000000312
0000000000000312 000000000000003F
Apr 12 14:11:19 elm3b37 kernel: GPR20: C0000000E595FE20 C0000000E4F04000
C0000000E595FE00 0000000000000000
Apr 12 14:11:19 elm3b37 kernel: GPR24: C0000000E4FAF000 0000000007FFFFFF
0000000000000000 0000000000002000
Apr 12 14:11:19 elm3b37 kernel: GPR28: C0000000E2438000 0000000000000400
D0000000003075B0 0000000000000400
Apr 12 14:11:19 elm3b37 kernel: NIP [D0000000002DB0D8]
.mthca_write_mtt+0x328/0x460 [ib_mthca]
Apr 12 14:11:19 elm3b37 kernel: LR [D0000000002DAE0C]
.mthca_write_mtt+0x5c/0x460 [ib_mthca]
Apr 12 14:11:19 elm3b37 kernel: Call Trace:
Apr 12 14:11:19 elm3b37 kernel: [C0000000E21171E0] [C0000000E2117300]
0xc0000000e2117300 (unreliable)
Apr 12 14:11:19 elm3b37 kernel: [C0000000E21172D0] [D0000000002DBD1C]
.mthca_mr_alloc_phys+0x8c/0x140 [ib_mthca]
Apr 12 14:11:19 elm3b37 kernel: [C0000000E2117390] [D0000000002D6B6C]
.mthca_create_eq+0x3ac/0x5e0 [ib_mthca]
Apr 12 14:11:19 elm3b37 kernel: [C0000000E2117490] [D0000000002D7528]
.mthca_init_eq_table+0x198/0x790 [ib_mthca]
Apr 12 14:11:19 elm3b37 kernel: [C0000000E2117560] [D0000000002D0368]
.__mthca_init_one+0xa38/0xd70 [ib_mthca]
Apr 12 14:11:19 elm3b37 kernel: [C0000000E2117640] [D0000000002D0714]
.mthca_init_one+0x74/0xf0 [ib_mthca]
Apr 12 14:11:19 elm3b37 kernel: [C0000000E21176E0] [C0000000002487D8]
.pci_device_probe+0x168/0x200
Apr 12 14:11:19 elm3b37 kernel: [C0000000E21177A0] [C0000000002C288C]
.really_probe+0xbc/0x1f0
Apr 12 14:11:19 elm3b37 kernel: [C0000000E2117850] [C0000000002C2D3C]
.__driver_attach+0xfc/0x140
Apr 12 14:11:19 elm3b37 kernel: [C0000000E21178E0] [C0000000002C1668]
.bus_for_each_dev+0x88/0xe0
Apr 12 14:11:19 elm3b37 kernel: [C0000000E21179A0] [C0000000002C2628]
.driver_attach+0x28/0x40
Apr 12 14:11:19 elm3b37 kernel: [C0000000E2117A20] [C0000000002C1C34]
.bus_add_driver+0xc4/0x220
Apr 12 14:11:19 elm3b37 kernel: [C0000000E2117AC0] [C0000000002C3118]
.driver_register+0x78/0xe0
Apr 12 14:11:19 elm3b37 kernel: [C0000000E2117B40] [C000000000248B70]
.__pci_register_driver+0x90/0x120
Apr 12 14:11:19 elm3b37 kernel: [C0000000E2117BE0] [D0000000002EA050]
.mthca_init+0x100/0x170 [ib_mthca]
Apr 12 14:11:19 elm3b37 kernel: [C0000000E2117C70] [C0000000000848FC]
.sys_init_module+0x20c/0x1990
Apr 12 14:11:19 elm3b37 kernel: [C0000000E2117E30] [C00000000000862C]
syscall_exit+0x0/0x40
Apr 12 14:11:19 elm3b37 kernel: Instruction dump:
Apr 12 14:11:19 elm3b37 kernel: 7d290214 7d495a14 409d0038 393fffff
39600000 79290020 39290001 7d2903a6
Apr 12 14:11:19 elm3b37 kernel: 60000000 60000000 7c1c582a 60000001
<7c0a592a> 396b0008 4200fff0 7bfb1f24
2. The above may or may not be a bug and as indicated in the message I
wanted to upgrade (the FW). However, I found that the
latest firmware is 3.5.0 and not 3.4.0 as the message seems to indicate. I
wanted to use IPOIB CM -so which one should I upgrade to -
presumably 3.5.0?
3. From the following url
http://www.mellanox.com/support/firmware_table_IH.php
it is not clear to me as to which firmware I should download.
lspci -v shows me :
0002:d9:00.0 InfiniBand: Mellanox Technologies MT23108 InfiniHost (rev a1)
Subsystem: Mellanox Technologies MT23108 InfiniHost
So, I was planning on using fw-23108-3_5_000-MHET2X-1SC_A1.bin.zip -Is
that correct?
3. When I downloaded mft-1.0.1.tar I found that ppc64 is not supported.
4. I moved my HCA to x86_64 and then tried to install mft utilities. There
was a previous version of the tool and I asked
to uinstall it. After that I see the following:
/home/tools/mft-1.0.1 # ./install.sh
*** Mellanox Firmware Tools (MFT) Package Installation ***
MFT Build 20060118-1817
Copyright (C) June 2002, Mellanox Technologies Ltd.
ALL RIGHTS RESERVED. Use of software subject to the
terms and conditions detailed in the file "LICENSE.txt".
Found a previous installation of the MFT package.
Current installed MFT Build ID is 20060118-1817
This installation MFT Build ID is 20060118-1817
Remove currently installed components (run
/usr/mellanox/mft/uninstall.sh) ? :(y/n) [n] y
Running /usr/mellanox/mft/uninstall.sh ...
Uninstall completed successfully.
This installation installs the MFT components into /usr
Installing MST package under /usr/mst ...
MFT Depends on pre-installed MST. Fail to find /usr/mst/lib/libmtcr.a
Nowhere could I find the libmtcr.a?
I need help with above listed issues. Thanks!
Pradeep
pradeep at us.ibm.com
--
MST
More information about the general
mailing list