[openib-general] Re: [PATCHv3] kDAPL: remove use of HANDLE's (vs. r2564)
James Lentini
jlentini at netapp.com
Wed Jun 8 14:30:01 PDT 2005
The new problem that I'm seeing is below. I don't think that this
patch caused this though, so I'll go ahead and commit your patch with
my modifications.
The oops I see is below. By my calculations, the crash is on line 450
of mthca_cq.c. That line is:
entry->wr_id = (*cur_qp)->wrid[wqe_index];
and the resulting instruction that fails is
b98: 8b 54 d8 04 mov 0x4(%eax,%ebx,8),%edx
Does anyone know which part of the C statement that is?
Unable to handle kernel paging request at virtual address 00002014
printing eip:
e0a65008
*pde = 1814d067
Oops: 0000 [#1]
Modules linked in: kdapltest ib_dat_provider dat ib_cm ib_at ib_ipoib
ib_sa md5 ipv6 parport_pc lp parport autofs4 nfs lockd sunrpc
i2c_piix4 i2c_core ib_mthca ib_mad ib_core e100 mii floppy sg aic7xxx
sd_mod scsi_mod
CPU: 0
EIP: 0060:[<e0a65008>] Not tainted VLI
EFLAGS: 00010046 (2.6.11-openib)
EIP is at mthca_poll_cq+0x368/0x760 [ib_mthca]
eax: 00000000 ebx: 00000402 ecx: 00000000 edx: dd6ea000
esi: cdbf7000 edi: cdbf7058 ebp: c0469cc8 esp: c0469c5c
ds: 007b es: 007b ss: 0068
Process swapper (pid: 0, threadinfo=c0469000 task=c03bfc20)
Stack: 00000000 c046cf1f 00000030 00000000 00000000 00000000 00000000
00000000
cdbf5040 00000000 00000000 00000000 c0469d04 00000000 00000086
dd8af000
c0469d04 00000001 d1a8d9c0 00000000 00000046 00000001 00000000
cdbf7000
Call Trace:
[<c01034ba>] show_stack+0x7a/0x90
[<c0103639>] show_registers+0x149/0x1c0
[<c0103886>] die+0x126/0x2a0
[<c0110b7e>] do_page_fault+0x45e/0x644
[<c0103003>] error_code+0x2b/0x30
[<e1ac530b>] dapl_ib_completion_poll+0x3c/0xd1 [ib_dat_provider]
[<e1aceed5>] dapl_evd_cq_poll_to_event+0x17/0x3b [ib_dat_provider]
[<e1ad03ea>] dapl_evd_dequeue+0x277/0x34b [ib_dat_provider]
[<e1ace0a4>] dapl_evd_upcall_trigger+0x34/0x66 [ib_dat_provider]
[<e1acfbf6>] dapl_evd_dto_callback+0xd4/0xea [ib_dat_provider]
[<e0a644b3>] mthca_cq_event+0x33/0x80 [ib_mthca]
[<e0a629f4>] mthca_eq_int+0x3a4/0x580 [ib_mthca]
[<e0a62c51>] mthca_tavor_interrupt+0x81/0x350 [ib_mthca]
[<c01426d5>] handle_IRQ_event+0x35/0x70
[<c0142818>] __do_IRQ+0x108/0x340
[<c0104b76>] do_IRQ+0x96/0xa0
[<c0102fca>] common_interrupt+0x1a/0x20
[<c01426d5>] handle_IRQ_event+0x35/0x70
[<c0142818>] __do_IRQ+0x108/0x340
[<c0104b3a>] do_IRQ+0x5a/0xa0
=======================
[<c0102fca>] common_interrupt+0x1a/0x20
[<c0100627>] cpu_idle+0x57/0x60
[<c0100249>] rest_init+0x19/0x20
[<c043b8ca>] start_kernel+0x17a/0x1f0
[<c010019f>] 0xc010019f
Code: 00 00 00 8b 55 b4 0f b6 52 1d 81 e2 80 00 00 00 e9 d3 fd ff ff
8b 45 b4 8d 7e 58 8b 4f 34 8b 58 18 8b 86 e0 00 00 00 0f cb d3 eb <8b>
54 d8 04 8b 04 d8 e9 36 fe ff ff 8b 55 b4 8b 4d c4 8b 42 14
On Wed, 8 Jun 2005, Tom Duffy wrote:
> On Wed, 2005-06-08 at 13:44 -0700, Tom Duffy wrote:
>> On Wed, 2005-06-08 at 16:36 -0400, James Lentini wrote:
>>> Do you see any additional stability problems after applying this? I'm
>>> updating my OpenIB tree to see if that is my problem.
>>
>> This just in!
>
> I think your patch is fine, at least it doesn't introduce any new bugs,
> because after a reboot of both machines, restarting the SM, the quit
> test and the transaction test work fine.
>
> -tduffy
>
More information about the general
mailing list