[ofa-general] Re: lockdep question (was Re: IPoIB caused a kernel: BUG: softlockup detected on CPU#0!)

Michael S. Tsirkin mst at mellanox.co.il
Mon Mar 12 07:20:13 PDT 2007


> Quoting Ingo Molnar <mingo at elte.hu>:
> Subject: Re: lockdep question (was Re: IPoIB caused a kernel: BUG: softlockup detected on CPU#0!)
> 
> 
> * Michael S. Tsirkin <mst at mellanox.co.il> wrote:
> 
> > > could you turn on CONFIG_SLAB_DEBUG as well?
> > > 
> > > that should catch certain types of use-after-free accesses, and 
> > > lockdep will also warn if a still locked object is freed.
> > 
> > Hmm, no, this does not look like use-after-free. I enabled 
> > CONFIG_SLAB_DEBUG, and I still see the same message, so the memory was 
> > not overwritten by slab debugger.
> 
> that's still not conclusive - the memory might not have been allocated 
> by slab again to detect it. Your magic-number check definitely shows 
> some sort of corruption going on, right?

Not necessarily in such a direct way.

I currently think we are somehow getting neighbours where
neigh->dev points to a loopback device - that's type 772,
and this seems to make sense.
I printed out the device name and sure enough it is "lo".

Is it true that sticking the following

static int ipoib_neigh_setup_dev(struct net_device *dev,
				 struct neigh_parms *parms)
{
	parms->neigh_destructor = ipoib_neigh_destructor;

	return 0;
}

in dev->neigh_setup, as ipoib does, guarantees that neighbour->dev will point to
the current device for any neighbour which ipoib_neigh_destructor gets?

That's the assumption IPoIB makes, and it seems broken in this instance.

How could that be?

-- 
MST



More information about the general mailing list