[openib-general] Re: slab error while removing ib_mad
Or Gerlitz
ogerlitz at voltaire.com
Sun May 14 03:45:54 PDT 2006
On Sun, 23 Apr 2006, Or Gerlitz wrote:
> I am getting the below trace on 2.6.17-rc2 / AMD x86_64 / PCIX HCA
> with both the IB sources that come with the kernel and svn trunk 6520.
>
> This happens if i just modprobe -r ib_mthca after fresh reboot, can
> anyone reproduce it on her/his system as well? The module does get
> modprobed out.
>
> Or.
>
> $ modprobe -r ib_mthca
>
> $ dmesg
>
> slab error in kmem_cache_destroy(): cache `ib_mad': Can't free all objects
>
> Call Trace: <ffffffff8106decf>{kmem_cache_destroy+150}
> <ffffffff88094615>{:ib_mad:ib_mad_cleanup_module+25}
> <ffffffff810447ef>{sys_delete_module+415}
> <ffffffff8112f0cf>{__up_write+20}
> <ffffffff8105ce8b>{sys_munmap+91}
> <ffffffff81009656>{system_call+126}
>
> ib_mad: Failed to destroy ib_mad cache
Roland,
I think you were on vacation when i posted this, there were two responses
saying they were not able to reproduce it, but no one was trying 2.6.17-X
The current status is that I can still reproduce it, now with
2.6.17-rc4-git2 and the IB code that comes with the kernel.
However i am ***not*** able to reproduce the failure i had with the
trivial stand alone module which i have posted in
http://openib.org/pipermail/openib-general/2006-April/020582.html
So it seems there might be more then one issue here, or that the various
problems i had on 2.6.17-rc1/2/3 were related but not totally solved with
2.6.17-rc4-git2
Following this failure the system gets into an unstable state, eg if i try
to cat /proc/slabinfo i get the below oops
Or.
Unable to handle kernel paging request at ffffffff88092be5 RIP:
<ffffffff81131580>{strnlen+12}
PGD 1003027 PUD 1005027 PMD 37f32067 PTE 0
Oops: 0000 [1] SMP
CPU 1
Modules linked in: usbserial nfsd exportfs parport_pc lp parport edd joydev sg st sr_mod button battery ac ipv6 nfs lockd sunrpc sata_sil libata ohci_hcd hw_random i2c_amd8111 i2c_core e100 mii tg3 sd_mod scsi_mod dm_mod
Pid: 16415, comm: cat Not tainted 2.6.17-rc4-git2 #1
RIP: 0010:[<ffffffff81131580>] <ffffffff81131580>{strnlen+12}
RSP: 0018:ffff810019645d10 EFLAGS: 00010297
RAX: ffffffff88092be5 RBX: ffff810019645d68 RCX: 000000000000000a
RDX: ffff810019645d98 RSI: fffffffffffffffe RDI: ffffffff88092be5
RBP: ffffffff88092be5 R08: 00000000ffffffff R09: 00000000000001d0
R10: 0000000000000000 R11: 0000000000000002 R12: ffff81003c964185
R13: 0000000000000011 R14: 0000000000000010 R15: ffff81003c964fff
FS: 00002b8716e9b6e0(0000) GS:ffff81003f7a1768(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffffff88092be5 CR3: 0000000034424000 CR4: 00000000000006e0
Process cat (pid: 16415, threadinfo ffff810019644000, task ffff81001f778880)
Stack: ffffffff81132586 0000000000000e7b ffff81003c964185 ffffffff81295c05
ffff8100322d4930 ffff81001582a240 0000000000000000 0000000000000000
ffff8100322d4930 0000000000000000
Call Trace: <ffffffff81132586>{vsnprintf+765} <ffffffff8109009f>{seq_printf+165}
<ffffffff81053226>{__alloc_pages+101} <ffffffff8105ab9d>{__handle_mm_fault+2357}
<ffffffff8106bfc9>{s_start+21} <ffffffff8104020d>{debug_mutex_add_waiter+144}
<ffffffff81264093>{__mutex_lock_slowpath+768} <ffffffff8106bed4>{s_show+616}
<ffffffff810903d3>{seq_read+264} <ffffffff81072707>{vfs_read+209}
<ffffffff81072864>{sys_read+69} <ffffffff81009676>{system_call+126}
Code: 80 3f 00 74 11 48 ff ce 48 ff c0 48 83 fe ff 74 05 80 38 00
RIP <ffffffff81131580>{strnlen+12} RSP <ffff810019645d10>
CR2: ffffffff88092be5
<3>BUG: sleeping function called from invalid context at include/linux/rwsem.h:43
in_atomic():0, irqs_disabled():1
Call Trace: <ffffffff81022e39>{__might_sleep+196} <ffffffff8103724f>{blocking_notifier_call_chain+31}
<ffffffff8102d3fb>{do_exit+34} <ffffffff8118606d>{do_unblank_screen+123}
<ffffffff81266ba3>{do_page_fault+1864} <ffffffff810a99ec>{proc_alloc_inode+18}
<ffffffff810a9a1a>{proc_alloc_inode+64} <ffffffff8108972c>{alloc_inode+272}
<ffffffff8103c75d>{bit_waitqueue+53} <ffffffff8108726e>{d_rehash+113}
<ffffffff8100a5b5>{error_exit+0} <ffffffff81131580>{strnlen+12}
<ffffffff81132586>{vsnprintf+765} <ffffffff8109009f>{seq_printf+165}
<ffffffff81053226>{__alloc_pages+101} <ffffffff8105ab9d>{__handle_mm_fault+2357}
<ffffffff8106bfc9>{s_start+21} <ffffffff8104020d>{debug_mutex_add_waiter+144}
<ffffffff81264093>{__mutex_lock_slowpath+768} <ffffffff8106bed4>{s_show+616}
<ffffffff810903d3>{seq_read+264} <ffffffff81072707>{vfs_read+209}
<ffffffff81072864>{sys_read+69} <ffffffff81009676>{system_call+126}
Losing some ticks... checking if CPU frequency changed.
BUG: cat/16415, active lock [ffff8100322d4960(ffff8100322d4930-ffff8100322d4a30)] freed!
Call Trace: <ffffffff8103f95c>{mutex_debug_check_no_locks_freed+273}
<ffffffff8106c576>{kfree+115} <ffffffff8108fdf6>{seq_release+24}
<ffffffff81072da2>{__fput+181} <ffffffff81070356>{filp_close+91}
<ffffffff8102bf47>{put_files_struct+105} <ffffffff8102d665>{do_exit+652}
<ffffffff8118606d>{do_unblank_screen+123} <ffffffff81266ba3>{do_page_fault+1864}
<ffffffff810a99ec>{proc_alloc_inode+18} <ffffffff810a9a1a>{proc_alloc_inode+64}
<ffffffff8108972c>{alloc_inode+272} <ffffffff8103c75d>{bit_waitqueue+53}
<ffffffff8108726e>{d_rehash+113} <ffffffff8100a5b5>{error_exit+0}
<ffffffff81131580>{strnlen+12} <ffffffff81132586>{vsnprintf+765}
<ffffffff8109009f>{seq_printf+165} <ffffffff81053226>{__alloc_pages+101}
<ffffffff8105ab9d>{__handle_mm_fault+2357} <ffffffff8106bfc9>{s_start+21}
<ffffffff8104020d>{debug_mutex_add_waiter+144} <ffffffff81264093>{__mutex_lock_slowpath+768}
<ffffffff8106bed4>{s_show+616} <ffffffff810903d3>{seq_read+264}
<ffffffff81072707>{vfs_read+209} <ffffffff81072864>{sys_read+69}
<ffffffff81009676>{system_call+126}
[ffff8100322d4960] {seq_open}
.. held by: cat:16415 [ffff81001f778880, 116]
... acquired at: seq_read+0x36/0x2a4
More information about the general
mailing list