[ofa-general] mthca issues -need help

Pradeep Satyanarayana pradeep at us.ibm.com
Fri Apr 13 16:01:40 PDT 2007


For some reason the patch did not apply. So, I hand patched it and I see a 
new Oops now. I will try and upgrade the firmware and see
if these problems go away.

Apr 13 18:53:37 elm3b37 kernel: ib_mthca: Initializing 0002:d9:00.0
Apr 13 18:53:38 elm3b37 kernel: ib_mthca 0002:d9:00.0: HCA FW version 
3.3.3 is old (3.4.0 is current).
Apr 13 18:53:38 elm3b37 kernel: ib_mthca 0002:d9:00.0: If you have 
problems, try updating your HCA FW.
Apr 13 18:53:38 elm3b37 kernel: Unable to handle kernel paging request for 
data at address 0x0000000c
Apr 13 18:53:38 elm3b37 kernel: Faulting instruction address: 
0xc00000000040a7f0
Apr 13 18:53:38 elm3b37 kernel: Oops: Kernel access of bad area, sig: 11 
[#2]
Apr 13 18:53:38 elm3b37 kernel: SMP NR_CPUS=128 NUMA
Apr 13 18:53:38 elm3b37 kernel: Modules linked in: ib_mthca ib_mad ib_core 
autofs4 ipv6 binfmt_misc parport_pc lp parport e1000 sg dm_snapshot 
dm_zero dm_mirror dm_mod ipr libata sd_mod scsi_mod firmware_class 
ehci_hcd ohci_hcd usbcore
Apr 13 18:53:38 elm3b37 kernel: NIP: C00000000040A7F0 LR: D00000000025D544 
CTR: C00000000040A7D0
Apr 13 18:53:38 elm3b37 kernel: REGS: c0000000df04b060 TRAP: 0300   Not 
tainted  (2.6.21-rc5)
Apr 13 18:53:38 elm3b37 kernel: MSR: 8000000000009032 <EE,ME,IR,DR>  CR: 
44022444  XER: 20000008
Apr 13 18:53:38 elm3b37 kernel: DAR: 000000000000000C, DSISR: 
0000000040000000
Apr 13 18:53:38 elm3b37 kernel: TASK = c00000000fe88040[3878] 'modprobe' 
THREAD: c0000000df048000 CPU: 0
Apr 13 18:53:38 elm3b37 kernel: GPR00: 0000000080000000 C0000000DF04B2E0 
C000000000612268 000000000000000C
Apr 13 18:53:38 elm3b37 kernel: GPR04: 0000000000000004 0000000000000000 
C0000000DE746A90 0000000000000048
Apr 13 18:53:38 elm3b37 kernel: GPR08: 0000000000000001 0000000000000001 
C0000000E1A9D880 C00000000040A7D0
Apr 13 18:53:38 elm3b37 kernel: GPR12: D00000000026D598 C000000000535A80 
AAAAAAAAAAAAAAAB D0000000004CAC80
Apr 13 18:53:39 elm3b37 kernel: GPR16: 0000000000000000 0000000000000312 
0000000000000312 0000000000000000
Apr 13 18:53:39 elm3b37 kernel: GPR20: 000000000000000C D0000000004C9DB2 
00000000000000FF 0000000000000001
Apr 13 18:53:39 elm3b37 kernel: GPR24: 0000000000000000 0000000000000004 
0000000000000000 0000000000000000
Apr 13 18:53:39 elm3b37 kernel: GPR28: 0000000000000004 C0000000DF361000 
D00000000028A5B0 000000000000000C
Apr 13 18:53:39 elm3b37 kernel: NIP [C00000000040A7F0] 
._spin_lock+0x20/0x90
Apr 13 18:53:39 elm3b37 kernel: LR [D00000000025D544] 
.mthca_buddy_alloc+0x34/0x220 [ib_mthca]
Apr 13 18:53:39 elm3b37 kernel: Call Trace:
Apr 13 18:53:39 elm3b37 kernel: [C0000000DF04B2E0] [C00000000064E6EC] 
0xc00000000064e6ec (unreliable)
Apr 13 18:53:39 elm3b37 kernel: [C0000000DF04B360] [D00000000025D544] 
.mthca_buddy_alloc+0x34/0x220 [ib_mthca]
Apr 13 18:53:39 elm3b37 kernel: [C0000000DF04B410] [D00000000025D760] 
.mthca_alloc_mtt_range+0x30/0xe0 [ib_mthca]
Apr 13 18:53:39 elm3b37 kernel: [C0000000DF04B4B0] [D00000000025E5C4] 
.mthca_init_mr_table+0x134/0x490 [ib_mthca]
Apr 13 18:53:39 elm3b37 kernel: [C0000000DF04B560] [D000000000253288] 
.__mthca_init_one+0x958/0xd70 [ib_mthca]
Apr 13 18:53:39 elm3b37 kernel: [C0000000DF04B640] [D000000000253714] 
.mthca_init_one+0x74/0xf0 [ib_mthca]
Apr 13 18:53:39 elm3b37 kernel: [C0000000DF04B6E0] [C0000000002487D8] 
.pci_device_probe+0x168/0x200
Apr 13 18:53:39 elm3b37 kernel: [C0000000DF04B7A0] [C0000000002C288C] 
.really_probe+0xbc/0x1f0
Apr 13 18:53:39 elm3b37 kernel: [C0000000DF04B850] [C0000000002C2D3C] 
.__driver_attach+0xfc/0x140
Apr 13 18:53:39 elm3b37 kernel: [C0000000DF04B8E0] [C0000000002C1668] 
.bus_for_each_dev+0x88/0xe0
Apr 13 18:53:39 elm3b37 kernel: [C0000000DF04B9A0] [C0000000002C2628] 
.driver_attach+0x28/0x40
Apr 13 18:53:39 elm3b37 kernel: [C0000000DF04BA20] [C0000000002C1C34] 
.bus_add_driver+0xc4/0x220
Apr 13 18:53:39 elm3b37 kernel: [C0000000DF04BAC0] [C0000000002C3118] 
.driver_register+0x78/0xe0
Apr 13 18:53:39 elm3b37 kernel: [C0000000DF04BB40] [C000000000248B70] 
.__pci_register_driver+0x90/0x120
Apr 13 18:53:39 elm3b37 kernel: [C0000000DF04BBE0] [D00000000026D070] 
.mthca_init+0x100/0x170 [ib_mthca]
Apr 13 18:53:39 elm3b37 kernel: [C0000000DF04BC70] [C0000000000848FC] 
.sys_init_module+0x20c/0x1990
Apr 13 18:53:39 elm3b37 kernel: [C0000000DF04BE30] [C00000000000862C] 
syscall_exit+0x0/0x40
Apr 13 18:53:39 elm3b37 kernel: Instruction dump:
Apr 13 18:53:39 elm3b37 kernel: 4bc2cb01 60000000 4bffffe4 60000000 
7c0802a6 fbe1fff0 7c7f1b78 f8010010
Apr 13 18:53:39 elm3b37 kernel: 38000000 f821ff81 980d01ca 800d0008 
<7d20f828> 2c090000 40820010 7c00f92d

Pradeep
pradeep at us.ibm.com



Roland Dreier <rdreier at cisco.com> 
04/13/2007 02:43 PM

To
Pradeep Satyanarayana/Beaverton/IBM at IBMUS
cc
general at lists.openfabrics.org, "Michael S. Tsirkin" 
<mst at dev.mellanox.co.il>
Subject
Re: [ofa-general] mthca issues -need help






I see...

 >         Region 0: Memory at 400c0800000 (64-bit, non-prefetchable) 
[size=1M]
 >         Region 2: Memory at 400c0000000 (64-bit, prefetchable) 
[size=8M]
 >         Capabilities: [40] MSI-X: Enable- Mask- TabSize=32

you are running an HCA with the 3rd BAR hidden.

Can you try the patch below and see if things work better?

diff --git a/drivers/infiniband/hw/mthca/mthca_mr.c 
b/drivers/infiniband/hw/mthca/mthca_mr.c
index fdb576d..818c27e 100644
--- a/drivers/infiniband/hw/mthca/mthca_mr.c
+++ b/drivers/infiniband/hw/mthca/mthca_mr.c
@@ -926,7 +926,9 @@ int mthca_init_mr_table(struct mthca_dev *dev)
 
                                 dev->mr_table.fmr_mtt_buddy =
 &dev->mr_table.tavor_fmr.mtt_buddy;
-                } else
+                } else if (dev->mthca_flags & MTHCA_FLAG_DDR_HIDDEN)
+                                dev->mr_table.fmr_mtt_buddy = NULL;
+                else
                                 dev->mr_table.fmr_mtt_buddy = 
&dev->mr_table.mtt_buddy;
 
                 /* FMR table is always the first, take reserved MTTs out 
of there */





More information about the general mailing list