[openib-general] slab error in kmem_cache_destroy(): cache `ib_mad': Can't free all objects
Michael S. Tsirkin
mst at mellanox.co.il
Mon May 2 05:27:34 PDT 2005
Quoting r. Hal Rosenstock <halr at voltaire.com>:
> Subject: Re: [openib-general] slab error in kmem_cache_destroy(): cache `ib_mad': Can't free all objects
>
> On Mon, 2005-05-02 at 04:40, Michael S. Tsirkin wrote:
> > Hi!
> > I have this script to unload all modules:
> >
> > killall opensm
> > sleep 3
> > killall -9 opensm
> > modprobe -r ib_ipoib
> > modprobe -r ib_umad
> > modprobe -r ib_mthca
> >
> > So, I try to unload the modules while opensm may still be dying.
> > Every now and then I see this crash (below), which seems to indicate
> > a race condition or leak somewhere around ib_mad or ib_umad.
> > My guess is a mad may still be outstanding.
> >
> > Ideas, anyone?
> >
> > Further, I'm looking at the mad agent logic and it seems a bit weird that
> > ./core/agent.c does kmem_cache_free in agent_send_handler,
> > while all allocs are in mad.c.
>
> This has been brought up before on the list. I think much of this could
> (and should) be rewritten now using Sean's helper functions.
>
> > What prevents an agent from deregistering while a send is outstanding?
> > What'll free the mad_priv then?
>
> Nothing. I think there is some missing code in ib_agent_port_close to
> handle this scenario.
>
> However, unless that MAD from the SM were directed locally (and was
> pending), that would not cause the problem where the ib_mad cache could
> not be destroyed. I will see if I can recreate this and work up a patch
> for this.
Interestingly, sometimes the error occurs even if I wait seconds
after killing opensm.
> > log dump below.
> >
> > Thanks,
> >
> > MST
> >
> > This is with 2.6.11 + rev 2235 (latest bits as of now), x86_64 (Intel Nocona).
> >
> >
> > May 2 10:36:12 swlab156 kernel: slab error in kmem_cache_destroy(): cache `ib_mad': Can't free all objects
> > May 2 10:36:12 swlab156 kernel:
> > May 2 10:36:12 swlab156 kernel: Call Trace:<ffffffff801592af>{kmem_cache_destroy+184} <ffffffff88010714>{:ib_mad:ib_mad_cleanup_module+28}
> > May 2 10:36:12 swlab156 kernel: <ffffffff8014c044>{sys_delete_module+487} <ffffffff8022991c>{__up_write+28}
> > May 2 10:36:12 swlab156 kernel: <ffffffff80162952>{sys_munmap+74} <ffffffff8010e0d2>{system_call+126}
> > May 2 10:36:12 swlab156 kernel:
> > May 2 10:36:12 swlab156 kernel: ib_mad: Failed to destroy ib_mad cache
>
> Has this been occuring for a while or is this new (with the recent
> changes to mad handling) ?
>
I think I remember seeing this a week and a half ago.
--
MST - Michael S. Tsirkin
More information about the general
mailing list