[ofa-general] mthca issues -need help
Pradeep Satyanarayana
pradeep at us.ibm.com
Fri Apr 13 16:01:40 PDT 2007
For some reason the patch did not apply. So, I hand patched it and I see a
new Oops now. I will try and upgrade the firmware and see
if these problems go away.
Apr 13 18:53:37 elm3b37 kernel: ib_mthca: Initializing 0002:d9:00.0
Apr 13 18:53:38 elm3b37 kernel: ib_mthca 0002:d9:00.0: HCA FW version
3.3.3 is old (3.4.0 is current).
Apr 13 18:53:38 elm3b37 kernel: ib_mthca 0002:d9:00.0: If you have
problems, try updating your HCA FW.
Apr 13 18:53:38 elm3b37 kernel: Unable to handle kernel paging request for
data at address 0x0000000c
Apr 13 18:53:38 elm3b37 kernel: Faulting instruction address:
0xc00000000040a7f0
Apr 13 18:53:38 elm3b37 kernel: Oops: Kernel access of bad area, sig: 11
[#2]
Apr 13 18:53:38 elm3b37 kernel: SMP NR_CPUS=128 NUMA
Apr 13 18:53:38 elm3b37 kernel: Modules linked in: ib_mthca ib_mad ib_core
autofs4 ipv6 binfmt_misc parport_pc lp parport e1000 sg dm_snapshot
dm_zero dm_mirror dm_mod ipr libata sd_mod scsi_mod firmware_class
ehci_hcd ohci_hcd usbcore
Apr 13 18:53:38 elm3b37 kernel: NIP: C00000000040A7F0 LR: D00000000025D544
CTR: C00000000040A7D0
Apr 13 18:53:38 elm3b37 kernel: REGS: c0000000df04b060 TRAP: 0300 Not
tainted (2.6.21-rc5)
Apr 13 18:53:38 elm3b37 kernel: MSR: 8000000000009032 <EE,ME,IR,DR> CR:
44022444 XER: 20000008
Apr 13 18:53:38 elm3b37 kernel: DAR: 000000000000000C, DSISR:
0000000040000000
Apr 13 18:53:38 elm3b37 kernel: TASK = c00000000fe88040[3878] 'modprobe'
THREAD: c0000000df048000 CPU: 0
Apr 13 18:53:38 elm3b37 kernel: GPR00: 0000000080000000 C0000000DF04B2E0
C000000000612268 000000000000000C
Apr 13 18:53:38 elm3b37 kernel: GPR04: 0000000000000004 0000000000000000
C0000000DE746A90 0000000000000048
Apr 13 18:53:38 elm3b37 kernel: GPR08: 0000000000000001 0000000000000001
C0000000E1A9D880 C00000000040A7D0
Apr 13 18:53:38 elm3b37 kernel: GPR12: D00000000026D598 C000000000535A80
AAAAAAAAAAAAAAAB D0000000004CAC80
Apr 13 18:53:39 elm3b37 kernel: GPR16: 0000000000000000 0000000000000312
0000000000000312 0000000000000000
Apr 13 18:53:39 elm3b37 kernel: GPR20: 000000000000000C D0000000004C9DB2
00000000000000FF 0000000000000001
Apr 13 18:53:39 elm3b37 kernel: GPR24: 0000000000000000 0000000000000004
0000000000000000 0000000000000000
Apr 13 18:53:39 elm3b37 kernel: GPR28: 0000000000000004 C0000000DF361000
D00000000028A5B0 000000000000000C
Apr 13 18:53:39 elm3b37 kernel: NIP [C00000000040A7F0]
._spin_lock+0x20/0x90
Apr 13 18:53:39 elm3b37 kernel: LR [D00000000025D544]
.mthca_buddy_alloc+0x34/0x220 [ib_mthca]
Apr 13 18:53:39 elm3b37 kernel: Call Trace:
Apr 13 18:53:39 elm3b37 kernel: [C0000000DF04B2E0] [C00000000064E6EC]
0xc00000000064e6ec (unreliable)
Apr 13 18:53:39 elm3b37 kernel: [C0000000DF04B360] [D00000000025D544]
.mthca_buddy_alloc+0x34/0x220 [ib_mthca]
Apr 13 18:53:39 elm3b37 kernel: [C0000000DF04B410] [D00000000025D760]
.mthca_alloc_mtt_range+0x30/0xe0 [ib_mthca]
Apr 13 18:53:39 elm3b37 kernel: [C0000000DF04B4B0] [D00000000025E5C4]
.mthca_init_mr_table+0x134/0x490 [ib_mthca]
Apr 13 18:53:39 elm3b37 kernel: [C0000000DF04B560] [D000000000253288]
.__mthca_init_one+0x958/0xd70 [ib_mthca]
Apr 13 18:53:39 elm3b37 kernel: [C0000000DF04B640] [D000000000253714]
.mthca_init_one+0x74/0xf0 [ib_mthca]
Apr 13 18:53:39 elm3b37 kernel: [C0000000DF04B6E0] [C0000000002487D8]
.pci_device_probe+0x168/0x200
Apr 13 18:53:39 elm3b37 kernel: [C0000000DF04B7A0] [C0000000002C288C]
.really_probe+0xbc/0x1f0
Apr 13 18:53:39 elm3b37 kernel: [C0000000DF04B850] [C0000000002C2D3C]
.__driver_attach+0xfc/0x140
Apr 13 18:53:39 elm3b37 kernel: [C0000000DF04B8E0] [C0000000002C1668]
.bus_for_each_dev+0x88/0xe0
Apr 13 18:53:39 elm3b37 kernel: [C0000000DF04B9A0] [C0000000002C2628]
.driver_attach+0x28/0x40
Apr 13 18:53:39 elm3b37 kernel: [C0000000DF04BA20] [C0000000002C1C34]
.bus_add_driver+0xc4/0x220
Apr 13 18:53:39 elm3b37 kernel: [C0000000DF04BAC0] [C0000000002C3118]
.driver_register+0x78/0xe0
Apr 13 18:53:39 elm3b37 kernel: [C0000000DF04BB40] [C000000000248B70]
.__pci_register_driver+0x90/0x120
Apr 13 18:53:39 elm3b37 kernel: [C0000000DF04BBE0] [D00000000026D070]
.mthca_init+0x100/0x170 [ib_mthca]
Apr 13 18:53:39 elm3b37 kernel: [C0000000DF04BC70] [C0000000000848FC]
.sys_init_module+0x20c/0x1990
Apr 13 18:53:39 elm3b37 kernel: [C0000000DF04BE30] [C00000000000862C]
syscall_exit+0x0/0x40
Apr 13 18:53:39 elm3b37 kernel: Instruction dump:
Apr 13 18:53:39 elm3b37 kernel: 4bc2cb01 60000000 4bffffe4 60000000
7c0802a6 fbe1fff0 7c7f1b78 f8010010
Apr 13 18:53:39 elm3b37 kernel: 38000000 f821ff81 980d01ca 800d0008
<7d20f828> 2c090000 40820010 7c00f92d
Pradeep
pradeep at us.ibm.com
Roland Dreier <rdreier at cisco.com>
04/13/2007 02:43 PM
To
Pradeep Satyanarayana/Beaverton/IBM at IBMUS
cc
general at lists.openfabrics.org, "Michael S. Tsirkin"
<mst at dev.mellanox.co.il>
Subject
Re: [ofa-general] mthca issues -need help
I see...
> Region 0: Memory at 400c0800000 (64-bit, non-prefetchable)
[size=1M]
> Region 2: Memory at 400c0000000 (64-bit, prefetchable)
[size=8M]
> Capabilities: [40] MSI-X: Enable- Mask- TabSize=32
you are running an HCA with the 3rd BAR hidden.
Can you try the patch below and see if things work better?
diff --git a/drivers/infiniband/hw/mthca/mthca_mr.c
b/drivers/infiniband/hw/mthca/mthca_mr.c
index fdb576d..818c27e 100644
--- a/drivers/infiniband/hw/mthca/mthca_mr.c
+++ b/drivers/infiniband/hw/mthca/mthca_mr.c
@@ -926,7 +926,9 @@ int mthca_init_mr_table(struct mthca_dev *dev)
dev->mr_table.fmr_mtt_buddy =
&dev->mr_table.tavor_fmr.mtt_buddy;
- } else
+ } else if (dev->mthca_flags & MTHCA_FLAG_DDR_HIDDEN)
+ dev->mr_table.fmr_mtt_buddy = NULL;
+ else
dev->mr_table.fmr_mtt_buddy =
&dev->mr_table.mtt_buddy;
/* FMR table is always the first, take reserved MTTs out
of there */
More information about the general
mailing list