[openib-general] slab error in kmem_cache_destroy(): cache `ib_mad': Can't free all objects
Michael S. Tsirkin
mst at mellanox.co.il
Mon May 2 01:40:13 PDT 2005
Hi!
I have this script to unload all modules:
killall opensm
sleep 3
killall -9 opensm
modprobe -r ib_ipoib
modprobe -r ib_umad
modprobe -r ib_mthca
So, I try to unload the modules while opensm may still be dying.
Every now and then I see this crash (below), which seems to indicate
a race condition or leak somewhere around ib_mad or ib_umad.
My guess is a mad may still be outstanding.
Ideas, anyone?
Further, I'm looking at the mad agent logic and it seems a bit weird that
./core/agent.c does kmem_cache_free in agent_send_handler,
while all allocs are in mad.c. What prevents an agent from deregistering
while a send is outstanding? What'll free the mad_priv then?
log dump below.
Thanks,
MST
This is with 2.6.11 + rev 2235 (latest bits as of now), x86_64 (Intel Nocona).
May 2 10:36:12 swlab156 kernel: slab error in kmem_cache_destroy(): cache `ib_mad': Can't free all objects
May 2 10:36:12 swlab156 kernel:
May 2 10:36:12 swlab156 kernel: Call Trace:<ffffffff801592af>{kmem_cache_destroy+184} <ffffffff88010714>{:ib_mad:ib_mad_cleanup_module+28}
May 2 10:36:12 swlab156 kernel: <ffffffff8014c044>{sys_delete_module+487} <ffffffff8022991c>{__up_write+28}
May 2 10:36:12 swlab156 kernel: <ffffffff80162952>{sys_munmap+74} <ffffffff8010e0d2>{system_call+126}
May 2 10:36:12 swlab156 kernel:
May 2 10:36:12 swlab156 kernel: ib_mad: Failed to destroy ib_mad cache
Any attempt to load ib_mad after that fails:
May 2 10:36:25 swlab156 kernel: kmem_cache_create: duplicate cache ib_mad
May 2 10:36:25 swlab156 kernel: ----------- [cut here ] --------- [please bite here ] ---------
May 2 10:36:25 swlab156 kernel: Kernel BUG at slab:1472
May 2 10:36:25 swlab156 kernel: invalid operand: 0000 [1] SMP
May 2 10:36:25 swlab156 kernel: CPU 1
May 2 10:36:25 swlab156 kernel: Modules linked in: ib_mad ib_core
May 2 10:36:25 swlab156 kernel: Pid: 14102, comm: modprobe Not tainted 2.6.11-openib
May 2 10:36:25 swlab156 kernel: RIP: 0010:[kmem_cache_create+1384/1539] <ffffffff801598b6>{kmem_cache_create+1384}
May 2 10:36:25 swlab156 kernel: RIP: 0010:[<ffffffff801598b6>] <ffffffff801598b6>{kmem_cache_create+1384}
May 2 10:36:25 swlab156 kernel: RSP: 0018:ffff81015d8c7ee8 EFLAGS: 00010202
May 2 10:36:25 swlab156 kernel: RAX: 000000000000002a RBX: ffff81015fd69670 RCX: ffffffff804572a8
May 2 10:36:25 swlab156 kernel: RDX: ffffffff804572a8 RSI: 0000000000000296 RDI: ffffffff8055f0c0
May 2 10:36:25 swlab156 kernel: RBP: ffff81015fd69480 R08: ffff81015e0976c0 R09: 0000000000000000
May 2 10:36:25 swlab156 kernel: R10: 0000000000000000 R11: 0000000000000080 R12: ffffffff8055f0c0
May 2 10:36:25 swlab156 kernel: R13: 0000000000002000 R14: ffff810000000000 R15: 0000000000000080
May 2 10:36:25 swlab156 kernel: FS: 00002aaaaade26e0(0000) GS:ffffffff80583180(0000) knlGS:0000000000000000
May 2 10:36:25 swlab156 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
May 2 10:36:25 swlab156 kernel: CR2: 00002aaaaaacc000 CR3: 000000013f67a000 CR4: 00000000000006e0
May 2 10:36:25 swlab156 kernel: Process modprobe (pid: 14102, threadinfo ffff81015d8c6000, task ffff81015dcc57f0)
May 2 10:36:25 swlab156 kernel: Stack: ffffffffffffff80 0000000000000000 0000000000000000 ffffffff88010951
May 2 10:36:25 swlab156 kernel: 0000000000000180 ffffffff8045a000 ffffffff88013000 ffffffff80459fc0
May 2 10:36:25 swlab156 kernel: ffffffff80459fc0 00007ffffffff408
May 2 10:36:25 swlab156 kernel: Call Trace:<ffffffff88015033>{:ib_mad:ib_mad_init_module+51} <ffffffff8014ba19>{sys_init_module+298}
May 2 10:36:25 swlab156 kernel: <ffffffff8010e0d2>{system_call+126}
May 2 10:36:25 swlab156 kernel:
May 2 10:36:25 swlab156 kernel: Code: 0f 0b e5 be 3e 80 ff ff ff ff c0 05 48 8b 1b 48 8b 03 0f 18
May 2 10:36:25 swlab156 kernel: RIP <ffffffff801598b6>{kmem_cache_create+1384} RSP <ffff81015d8c7ee8>
--
MST - Michael S. Tsirkin
More information about the general
mailing list