[openib-general] slab error in kmem_cache_destroy(): cache `ib_mad': Can't free all objects

Michael S. Tsirkin mst at mellanox.co.il
Mon May 2 05:27:34 PDT 2005


Quoting r. Hal Rosenstock <halr at voltaire.com>:
> Subject: Re: [openib-general] slab error in kmem_cache_destroy(): cache `ib_mad': Can't free all objects
> 
> On Mon, 2005-05-02 at 04:40, Michael S. Tsirkin wrote:
> > Hi!
> > I have this script to unload all modules:
> > 
> > killall opensm
> > sleep 3
> > killall -9 opensm
> > modprobe -r ib_ipoib
> > modprobe -r ib_umad
> > modprobe -r ib_mthca
> > 
> > So, I try to unload the modules while opensm may still be dying.
> > Every now and then I see this crash (below), which seems to indicate
> > a race condition or leak somewhere around ib_mad or ib_umad.
> > My guess is a mad may still be outstanding.
> > 
> > Ideas, anyone?
> > 
> > Further, I'm looking at the mad agent logic and it seems a bit weird that
> > ./core/agent.c does kmem_cache_free in agent_send_handler,
> > while all allocs are in mad.c.
> 
> This has been brought up before on the list. I think much of this could
> (and should) be rewritten now using Sean's helper functions.
> 
> > What prevents an agent from deregistering while a send is outstanding? 
> > What'll free the mad_priv then?
> 
> Nothing. I think there is some missing code in ib_agent_port_close to
> handle this scenario.
> 
> However, unless that MAD from the SM were directed locally (and was
> pending), that would not cause the problem where the ib_mad cache could
> not be destroyed. I will see if I can recreate this and work up a patch
> for this.

Interestingly, sometimes the error occurs even if I wait seconds
after killing opensm.

> > log dump below.
> > 
> > Thanks,
> > 
> > MST
> > 
> > This is with 2.6.11 + rev 2235 (latest bits as of now), x86_64 (Intel Nocona).
> > 
> > 
> > May  2 10:36:12 swlab156 kernel: slab error in kmem_cache_destroy(): cache `ib_mad': Can't free all objects
> > May  2 10:36:12 swlab156 kernel: 
> > May  2 10:36:12 swlab156 kernel: Call Trace:<ffffffff801592af>{kmem_cache_destroy+184} <ffffffff88010714>{:ib_mad:ib_mad_cleanup_module+28} 
> > May  2 10:36:12 swlab156 kernel:        <ffffffff8014c044>{sys_delete_module+487} <ffffffff8022991c>{__up_write+28} 
> > May  2 10:36:12 swlab156 kernel:        <ffffffff80162952>{sys_munmap+74} <ffffffff8010e0d2>{system_call+126} 
> > May  2 10:36:12 swlab156 kernel:        
> > May  2 10:36:12 swlab156 kernel: ib_mad: Failed to destroy ib_mad cache
> 
> Has this been occuring for a while or is this new (with the recent
> changes to mad handling) ?
> 

I think I remember seeing this a week and a half ago.

-- 
MST - Michael S. Tsirkin



More information about the general mailing list