[openib-general] Re: slab error while removing ib_mad

Sun May 14 03:45:54 PDT 2006

On Sun, 23 Apr 2006, Or Gerlitz wrote:
> I am getting the below trace on 2.6.17-rc2 / AMD x86_64 / PCIX HCA 
> with both the IB sources that come with the kernel and svn trunk 6520.
> 
> This happens if i just modprobe -r ib_mthca after fresh reboot, can
> anyone reproduce it on her/his system as well? The module does get 
> modprobed out.
> 
> Or.
> 
> $ modprobe -r ib_mthca
> 
> $ dmesg
> 
> slab error in kmem_cache_destroy(): cache `ib_mad': Can't free all objects
> 
> Call Trace: <ffffffff8106decf>{kmem_cache_destroy+150}
> 	<ffffffff88094615>{:ib_mad:ib_mad_cleanup_module+25}
> 	<ffffffff810447ef>{sys_delete_module+415} 
> 	<ffffffff8112f0cf>{__up_write+20}
> 	<ffffffff8105ce8b>{sys_munmap+91} 
> 	<ffffffff81009656>{system_call+126}
> 
> ib_mad: Failed to destroy ib_mad cache

Roland,

I think you were on vacation when i posted this, there were two responses
saying they were not able to reproduce it, but no one was trying 2.6.17-X

The current status is that I can still reproduce it, now with 
2.6.17-rc4-git2 and the IB code that comes with the kernel. 

However i am ***not*** able to reproduce the failure i had with the 
trivial stand alone module which i have posted in 
http://openib.org/pipermail/openib-general/2006-April/020582.html 

So it seems there might be more then one issue here, or that the various 
problems i had on 2.6.17-rc1/2/3 were related but not totally solved with 
2.6.17-rc4-git2

Following this failure the system gets into an unstable state, eg if i try
to cat /proc/slabinfo i get the below oops

Or.

Unable to handle kernel paging request at ffffffff88092be5 RIP: 
<ffffffff81131580>{strnlen+12}
PGD 1003027 PUD 1005027 PMD 37f32067 PTE 0
Oops: 0000 [1] SMP 
CPU 1 
Modules linked in: usbserial nfsd exportfs parport_pc lp parport edd joydev sg st sr_mod button battery ac ipv6 nfs lockd sunrpc sata_sil libata ohci_hcd hw_random i2c_amd8111 i2c_core e100 mii tg3 sd_mod scsi_mod dm_mod
Pid: 16415, comm: cat Not tainted 2.6.17-rc4-git2 #1
RIP: 0010:[<ffffffff81131580>] <ffffffff81131580>{strnlen+12}
RSP: 0018:ffff810019645d10  EFLAGS: 00010297
RAX: ffffffff88092be5 RBX: ffff810019645d68 RCX: 000000000000000a
RDX: ffff810019645d98 RSI: fffffffffffffffe RDI: ffffffff88092be5
RBP: ffffffff88092be5 R08: 00000000ffffffff R09: 00000000000001d0
R10: 0000000000000000 R11: 0000000000000002 R12: ffff81003c964185
R13: 0000000000000011 R14: 0000000000000010 R15: ffff81003c964fff
FS:  00002b8716e9b6e0(0000) GS:ffff81003f7a1768(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffffff88092be5 CR3: 0000000034424000 CR4: 00000000000006e0
Process cat (pid: 16415, threadinfo ffff810019644000, task ffff81001f778880)
Stack: ffffffff81132586 0000000000000e7b ffff81003c964185 ffffffff81295c05 
       ffff8100322d4930 ffff81001582a240 0000000000000000 0000000000000000 
       ffff8100322d4930 0000000000000000 
Call Trace: <ffffffff81132586>{vsnprintf+765} <ffffffff8109009f>{seq_printf+165}
       <ffffffff81053226>{__alloc_pages+101} <ffffffff8105ab9d>{__handle_mm_fault+2357}
       <ffffffff8106bfc9>{s_start+21} <ffffffff8104020d>{debug_mutex_add_waiter+144}
       <ffffffff81264093>{__mutex_lock_slowpath+768} <ffffffff8106bed4>{s_show+616}
       <ffffffff810903d3>{seq_read+264} <ffffffff81072707>{vfs_read+209}
       <ffffffff81072864>{sys_read+69} <ffffffff81009676>{system_call+126}

Code: 80 3f 00 74 11 48 ff ce 48 ff c0 48 83 fe ff 74 05 80 38 00 
RIP <ffffffff81131580>{strnlen+12} RSP <ffff810019645d10>
CR2: ffffffff88092be5
 <3>BUG: sleeping function called from invalid context at include/linux/rwsem.h:43
in_atomic():0, irqs_disabled():1

Call Trace: <ffffffff81022e39>{__might_sleep+196} <ffffffff8103724f>{blocking_notifier_call_chain+31}
       <ffffffff8102d3fb>{do_exit+34} <ffffffff8118606d>{do_unblank_screen+123}
       <ffffffff81266ba3>{do_page_fault+1864} <ffffffff810a99ec>{proc_alloc_inode+18}
       <ffffffff810a9a1a>{proc_alloc_inode+64} <ffffffff8108972c>{alloc_inode+272}
       <ffffffff8103c75d>{bit_waitqueue+53} <ffffffff8108726e>{d_rehash+113}
       <ffffffff8100a5b5>{error_exit+0} <ffffffff81131580>{strnlen+12}
       <ffffffff81132586>{vsnprintf+765} <ffffffff8109009f>{seq_printf+165}
       <ffffffff81053226>{__alloc_pages+101} <ffffffff8105ab9d>{__handle_mm_fault+2357}
       <ffffffff8106bfc9>{s_start+21} <ffffffff8104020d>{debug_mutex_add_waiter+144}
       <ffffffff81264093>{__mutex_lock_slowpath+768} <ffffffff8106bed4>{s_show+616}
       <ffffffff810903d3>{seq_read+264} <ffffffff81072707>{vfs_read+209}
       <ffffffff81072864>{sys_read+69} <ffffffff81009676>{system_call+126}
Losing some ticks... checking if CPU frequency changed.
BUG: cat/16415, active lock [ffff8100322d4960(ffff8100322d4930-ffff8100322d4a30)] freed!

Call Trace: <ffffffff8103f95c>{mutex_debug_check_no_locks_freed+273}
       <ffffffff8106c576>{kfree+115} <ffffffff8108fdf6>{seq_release+24}
       <ffffffff81072da2>{__fput+181} <ffffffff81070356>{filp_close+91}
       <ffffffff8102bf47>{put_files_struct+105} <ffffffff8102d665>{do_exit+652}
       <ffffffff8118606d>{do_unblank_screen+123} <ffffffff81266ba3>{do_page_fault+1864}
       <ffffffff810a99ec>{proc_alloc_inode+18} <ffffffff810a9a1a>{proc_alloc_inode+64}
       <ffffffff8108972c>{alloc_inode+272} <ffffffff8103c75d>{bit_waitqueue+53}
       <ffffffff8108726e>{d_rehash+113} <ffffffff8100a5b5>{error_exit+0}
       <ffffffff81131580>{strnlen+12} <ffffffff81132586>{vsnprintf+765}
       <ffffffff8109009f>{seq_printf+165} <ffffffff81053226>{__alloc_pages+101}
       <ffffffff8105ab9d>{__handle_mm_fault+2357} <ffffffff8106bfc9>{s_start+21}
       <ffffffff8104020d>{debug_mutex_add_waiter+144} <ffffffff81264093>{__mutex_lock_slowpath+768}
       <ffffffff8106bed4>{s_show+616} <ffffffff810903d3>{seq_read+264}
       <ffffffff81072707>{vfs_read+209} <ffffffff81072864>{sys_read+69}
       <ffffffff81009676>{system_call+126}
 [ffff8100322d4960] {seq_open}
.. held by:               cat:16415 [ffff81001f778880, 116]
... acquired at:               seq_read+0x36/0x2a4