[openib-general] Re: ehca testing

Troy Benjegerdes hozer at hozed.org
Thu Oct 27 09:36:42 PDT 2005


On Thu, Oct 20, 2005 at 03:32:13PM -0700, Roland Dreier wrote:
>     Troy> There is some sort of strange initializiation error going on here..
> 
> Yes, very strange.  Can you add
> 
> 	printk(KERN_ERR "hca->node_type = %d\n", hca->node_type);
> 
> to the beginning of ipoib_add_port(), and
> 
> 	printk(KERN_ERR "dev->ib_dev.node_type = %d\n", dev->ib_dev.node_type);
> 
> right before the call to ib_register_device() in
> mthca_register_device() and send the output that you get when hotplug
> loads ib_mthca vs. when you load ib_mthca by hand?

When loaded at boot:

[586811.915831] ib_mthca: Mellanox InfiniBand HCA driver v0.06 (June 23,
2005)
[586811.915849] ib_mthca: Initializing 0000:d9:00.0
[586811.916634] PCI: Enabling device: (0000:d9:00.0), cmd 142
[586818.501595] openafs: module license
'http://www.openafs.org/dl/license10.html' taints kernel.
[586818.504651] Found system call table at 0xc000000000013e68 (scan:
close+ioctl)
[586818.520240] Starting AFS cache scan...Memory cache: Allocating 12500
dcacheentries...found 0 non-empty cache files (0%).
[586875.848354] afs: Lost contact with volume location server
147.155.137.10 incell scl.ameslab.gov
[586875.848374] afs: Lost contact with volume location server
147.155.137.10 incell scl.ameslab.gov
[587154.758768] hca->node_type = 236
[587154.760578] hca->node_type = 236
[587154.761511] hca->node_type = 236
[587154.761572] mthca0: ib_query_pkey port 3 failed (ret = -22)
[587154.761584] hca->node_type = 236
[587154.761633] mthca0: ib_query_pkey port 4 failed (ret = -22)
[587154.761644] hca->node_type = 236
[587154.762506] hca->node_type = 236
[587154.763422] hca->node_type = 236
[587154.763480] mthca0: ib_query_pkey port 7 failed (ret = -22)
[587154.763491] hca->node_type = 236
[587154.763542] mthca0: ib_query_pkey port 8 failed (ret = -22)
[587154.763553] hca->node_type = 236
[587154.765698] hca->node_type = 236
[587154.767136] hca->node_type = 236
[587154.767312] mthca0: ib_query_pkey port 11 failed (ret = -22)
[587154.767324] hca->node_type = 236
[587154.767455] mthca0: ib_query_pkey port 12 failed (ret = -22)
[587154.767471] hca->node_type = 236
[587154.769140] hca->node_type = 236
[587154.772116] hca->node_type = 236
[587154.772180] mthca0: ib_query_pkey port 15 failed (ret = -22)
[587154.772192] hca->node_type = 236
[587154.772243] mthca0: ib_query_pkey port 16 failed (ret = -22)
[587154.772255] hca->node_type = 236
[587154.773401] hca->node_type = 236
[587154.776817] hca->node_type = 236
[587154.776974] mthca0: ib_query_pkey port 19 failed (ret = -22)
[587154.776986] hca->node_type = 236
[587154.778179] mthca0: ib_query_pkey port 20 failed (ret = -22)
[587154.778198] hca->node_type = 236
[587154.780159] hca->node_type = 236
[587154.785406] hca->node_type = 236
[587154.785512] mthca0: ib_query_pkey port 23 failed (ret = -22)
[587154.785523] hca->node_type = 236
[587154.785582] mthca0: ib_query_pkey port 24 failed (ret = -22)
[587154.785599] hca->node_type = 236
[587154.789427] hca->node_type = 236
[587154.794314] hca->node_type = 236
[587154.794458] mthca0: ib_query_pkey port 27 failed (ret = -22)
[587154.794474] hca->node_type = 236
[587154.794634] mthca0: ib_query_pkey port 28 failed (ret = -22)
[587154.794646] hca->node_type = 236
[587154.797133] hca->node_type = 236
[587154.803507] hca->node_type = 236
[587154.803597] mthca0: ib_query_pkey port 31 failed (ret = -22)
[587154.803608] hca->node_type = 236
[587154.803667] mthca0: ib_query_pkey port 32 failed (ret = -22)
[587154.803679] hca->node_type = 236
[587154.820947] hca->node_type = 236
[587154.829795] hca->node_type = 236
[587154.831921] mthca0: ib_query_pkey port 35 failed (ret = -22)
[587154.831934] hca->node_type = 236
[587154.834932] mthca0: ib_query_pkey port 36 failed (ret = -22)
[587154.834946] hca->node_type = 236
[587154.844314] hca->node_type = 236
[587154.853591] hca->node_type = 236
[587154.853680] mthca0: ib_query_pkey port 39 failed (ret = -22)
[587154.853692] hca->node_type = 236
[587154.853745] mthca0: ib_query_pkey port 40 failed (ret = -22)
[587154.853761] hca->node_type = 236
[587154.869483] hca->node_type = 236
[587154.874749] hca->node_type = 236
[587154.874952] mthca0: ib_query_pkey port 43 failed (ret = -22)
[587154.874969] hca->node_type = 236
[587154.875609] mthca0: ib_query_pkey port 44 failed (ret = -22)
[587154.875624] hca->node_type = 236
[587154.894612] hca->node_type = 236
[587154.908058] hca->node_type = 236
[587154.909244] mthca0: ib_query_pkey port 47 failed (ret = -22)
[587154.909261] hca->node_type = 236
[587154.909323] mthca0: ib_query_pkey port 48 failed (ret = -22)
[587154.909334] hca->node_type = 236
[587154.918749] hca->node_type = 236
[587154.939629] hca->node_type = 236
[587154.939729] mthca0: ib_query_pkey port 51 failed (ret = -22)
[587154.939745] hca->node_type = 236
[587154.939866] mthca0: ib_query_pkey port 52 failed (ret = -22)
[587154.939883] hca->node_type = 236
[587154.957219] hca->node_type = 236
[587154.971523] hca->node_type = 236
[587154.971643] mthca0: ib_query_pkey port 55 failed (ret = -22)
[587154.971664] hca->node_type = 236
[587154.972717] mthca0: ib_query_pkey port 56 failed (ret = -22)
[587154.972733] hca->node_type = 236
[587154.984707] hca->node_type = 236
[587154.999129] hca->node_type = 236
[587154.999963] mthca0: ib_query_pkey port 59 failed (ret = -22)
[587154.999976] hca->node_type = 236
[587155.000264] mthca0: ib_query_pkey port 60 failed (ret = -22)
[587155.000282] hca->node_type = 236
[587155.012766] hca->node_type = 236
[587155.041105] hca->node_type = 236
[587155.041178] mthca0: ib_query_pkey port 63 failed (ret = -22)
[587155.041189] hca->node_type = 236
[587155.041319] mthca0: ib_query_pkey port 64 failed (ret = -22)
[587155.041332] hca->node_type = 236
[587155.066730] hca->node_type = 236
[587155.077348] hca->node_type = 236
[587155.077576] mthca0: ib_query_pkey port 67 failed (ret = -22)
[587155.077593] hca->node_type = 236
[587155.077883] mthca0: ib_query_pkey port 68 failed (ret = -22)
[587155.077896] hca->node_type = 236
[587155.097490] hca->node_type = 236
[587155.117809] hca->node_type = 236
[587155.117946] mthca0: ib_query_pkey port 71 failed (ret = -22)
[587155.117962] hca->node_type = 236
[587155.118016] mthca0: ib_query_pkey port 72 failed (ret = -22)
[587155.118031] hca->node_type = 236
[587155.138066] hca->node_type = 236
[587155.170056] hca->node_type = 236
[587155.170137] mthca0: ib_query_pkey port 75 failed (ret = -22)
[587155.170153] hca->node_type = 236
[587155.170213] mthca0: ib_query_pkey port 76 failed (ret = -22)
[587155.170225] hca->node_type = 236
[587155.205813] hca->node_type = 236
[587155.238014] hca->node_type = 236
[587155.238154] mthca0: ib_query_pkey port 79 failed (ret = -22)
[587155.238168] hca->node_type = 236
[587155.238242] mthca0: ib_query_pkey port 80 failed (ret = -22)
[587155.238256] hca->node_type = 236
[587155.266483] hca->node_type = 236
[587155.381938] hca->node_type = 236
[587155.382011] mthca0: ib_query_pkey port 83 failed (ret = -22)
[587155.382027] hca->node_type = 236
[587155.382113] mthca0: ib_query_pkey port 84 failed (ret = -22)
[587155.382125] hca->node_type = 236
[587155.418259] hca->node_type = 236
[587155.457782] hca->node_type = 236
[587155.457870] mthca0: ib_query_pkey port 87 failed (ret = -22)
[587155.457886] hca->node_type = 236
[587155.457953] mthca0: ib_query_pkey port 88 failed (ret = -22)
[587155.457966] hca->node_type = 236
[587155.477128] hca->node_type = 236
[587155.501172] hca->node_type = 236
[587155.501235] mthca0: ib_query_pkey port 91 failed (ret = -22)
[587155.501245] hca->node_type = 236
[587155.501312] mthca0: ib_query_pkey port 92 failed (ret = -22)
[587155.501323] hca->node_type = 236
[587155.580150] hca->node_type = 236
[587155.611763] hca->node_type = 236
[587155.611842] mthca0: ib_query_pkey port 95 failed (ret = -22)
[587155.611855] hca->node_type = 236
[587155.611913] mthca0: ib_query_pkey port 96 failed (ret = -22)
[587155.611929] hca->node_type = 236
[587155.663057] hca->node_type = 236
[587155.692342] hca->node_type = 236
[587155.692482] mthca0: ib_query_pkey port 99 failed (ret = -22)
[587155.692494] hca->node_type = 236
[587155.692554] mthca0: ib_query_pkey port 100 failed (ret = -22)
[587155.692572] hca->node_type = 236
[587155.759843] hca->node_type = 236
[587155.808226] hca->node_type = 236
[587155.808297] mthca0: ib_query_pkey port 103 failed (ret = -22)
[587155.808317] hca->node_type = 236
[587155.808370] mthca0: ib_query_pkey port 104 failed (ret = -22)
[587155.808383] hca->node_type = 236
[587155.847076] hca->node_type = 236
[587155.870709] hca->node_type = 236
[587155.870781] mthca0: ib_query_pkey port 107 failed (ret = -22)
[587155.870797] hca->node_type = 236
[587155.870857] mthca0: ib_query_pkey port 108 faile6
[587155.986258] mthca0: ib_query_pkey port 111 failed (ret = -22)
[587155.986269] hca->node_type = 236
[587155.986338] mthca0: ib_query_pkey port 112 failed (ret = -22)
[587155.986353] hca->node_type = 236
[587156.020368] hca->node_type = 236
[587156.068549] hca->node_type = 236
[587156.068626] mthca0: ib_query_pkey port 115 failed (ret = -22)
[587156.068643] hca->node_type = 236
[587156.068700] mthca0: ib_query_pkey port 116 failed (ret = -22)
[587156.068719] hca->node_type = 236
p5l1:~#
p5l1:~#
p5l1:~#
p5l1:~# # reload......
p5l1:~#
p5l1:~# rmmod ib_ipoib
p5l1:~# rmmod ib_mad
ERROR: Module ib_mad is in use by ib_sa,ib_mthca
p5l1:~# rmmod ib_sa
p5l1:~# rmmod ib_mthca
p5l1:~# rmmod ib_mad
p5l1:~# rmmod ib_core
p5l1:~#
p5l1:~# modprobe ib_mthca
p5l1:~# modprobe     
<kernel panics here>.


[587324.500037] ib_mthca: Mellanox InfiniBand HCA driver v0.06 (June 23, 2005)
[587324.500056] ib_mthca: Initializing 0000:d9:00.0
[587325.778913] dev->ib_dev.node_type = 1
[587330.812591] Oops: Kernel access of bad area, sig: 7 [#1]
[587330.812605] SMP NR_CPUS=8 NUMA PSERIES LPAR
[587330.812618] Modules linked in: ib_mthca ib_mad ib_core openafs
[587330.812637] NIP: D0000000098BF558 XER: 2000000B LR: C000000000057B2C CTR: D0
000000098BF4F0
[587330.812653] REGS: c0000001e3fb3490 TRAP: 0300   Tainted: P       (2.6.13.3-p
ower5)
[587330.812669] MSR: 8000000000009032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11 CR: 2800
0084
[587330.812682] DAR: d000010082187a04 DSISR: 0000000040000000
[587330.812694] TASK: c0000003dbf4d640[0] 'swapper' THREAD: c0000001e3fb0000 CPU
: 5
[587330.812708] GPR00: 0000000000000010 C0000001E3FB3710 D0000000098D64C0 D00001
0082187A04
[587330.812729] GPR04: 0000000000000008 000000010003727D 0000000000000000 00000000000007D0
[587330.812748] GPR08: C0000001E3E08910 0000000000000000 C0000001E3FB3840 D000010082187A04
[587330.812770] GPR12: 0000000048000082 C0000000004BEC00 0000000000000000 000000000FA8536C
[587330.812790] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[587330.812809] GPR20: 0000000000000000 C0000000005F7ED8 C0000000005F7F40 C000000000606500
[587330.812830] GPR24: C0000001EAE84498 C0000001E3FB3840 C0000001E3FB0000 C0000001E3E08000
[587330.812852] GPR28: 0000000000000100 C0000001E3E08000 D0000000098D4E40 0000000000000000
[587330.812875] NIP [d0000000098bf558] .poll_catas+0x68/0x2f0 [ib_mthca]
[587330.812914] LR [c000000000057b2c] .run_timer_softirq+0x15c/0x260
[587330.812932] Call Trace:
[587330.812940] [c0000001e3fb3710] [c0000001e3fb37d0] 0xc0000001e3fb37d0 (unreliable)
[587330.812959] [c0000001e3fb37d0] [c000000000057b2c] .run_timer_softirq+0x15c/0x260
[587330.812979] [c0000001e3fb3890] [c000000000051e68] .__do_softirq+0xe8/0x1c0
[587330.812997] [c0000001e3fb3950] [c000000000051fc4] .do_softirq+0x84/0x90
[587330.813016] [c0000001e3fb39d0] [c0000000000108f0] .timer_interrupt+0xd0/0x41
0
[587330.813036] [c0000001e3fb3ad0] [c00000000000a2b4] decrementer_common+0xb4/0x100
[587330.813052] --- Exception: 901 at .pseries_dedicated_idle+0x108/0x280
[587330.813071]     LR = .pseries_dedicated_idle+0x1e0/0x280
[587330.813083] [c0000001e3fb3e90] [c00000000000f460] .cpu_idle+0x40/0x60
[587330.813101] [c0000001e3fb3f00] [c000000000032fa0] .start_secondary+0x120/0x150
[587330.813120] [c0000001e3fb3f90] [c00000000000ba7c] .enable_64b_mode+0x0/0x28
[587330.813136] Instruction dump:
[587330.813144] 3be00000 48000020 2fab0000 381f0001 7c1f07b4 409e0058 801d0908 7f9f0040
[587330.813169] 409c00c8 e97d08f8 7be91764 7c6b4a14 <7c001c2c> 0c000000 4c00012c 780b0020
[587330.813193]  <0>Kernel panic - not syncing: Fatal exception in interrupt
[587330.813208]




More information about the general mailing list