[ofa-general] Re: lockdep question (was Re: IPoIB caused a kernel: BUG: soft lockup detected on CPU#0!)
Peter Zijlstra
a.p.zijlstra at chello.nl
Sun Mar 11 08:25:19 PDT 2007
On Sun, 2007-03-11 at 15:50 +0200, Michael S. Tsirkin wrote:
> > Quoting Roland Dreier <roland.list at gmail.com>:
> > Subject: Re: IPoIB caused a kernel: BUG: soft lockup detected on CPU#0!
> >
> > >Feb 27 17:47:52 sw169 kernel: [<ffffffff8053aaf1>] _spin_lock_irqsave+0x15/0x24
> > >Feb 27 17:47:52 sw169 kernel: [<ffffffff88067a23>] :ib_ipoib:ipoib_neigh_destructor+0xc2/0x139
> >
> > It looks like this is deadlocking trying to take priv->lock in ipoib_neigh_destructor().
> > One idea I just had would be to build a kernel with CONFIG_PROVE_LOCKING
> > turned on, and then rerun this test. There's a good chance that this would
> > diagnose the deadlock. (I don't have good access to my test machines right now, or
> > else I would do it myself)
>
> OK, I did that. But I get
> [13440.761857] INFO: trying to register non-static key.
> [13440.766903] the code is fine but needs lockdep annotation.
> [13440.772455] turning off the locking correctness validator.
> and I am not sure what triggers this, or how to fix it to have the
> validator actually do its job.
It usually indicates a spinlock is not properly initialized. Like
__SPIN_LOCK_UNLOCKED() used in a non-static context, use
spin_lock_init() in these cases.
However looking at the code, ipoib_neight_destructor only uses
&priv->lock, and that seems to get properly initialized in ipoib_setup()
using spin_lock_init().
So either there are other sites that instanciate those objects and
forget about the lock init, or the object is corrupted (use after free?)
> Ingo, what key does the message refer to?
>
> The stack dump seems to point to drivers/infiniband/ulp/ipoib/ipoib_main.c line
> 829.
>
> Full message below:
>
> [13440.761857] INFO: trying to register non-static key.
> [13440.766903] the code is fine but needs lockdep annotation.
> [13440.772455] turning off the locking correctness validator.
> [13440.778008] [<c023c082>] __lock_acquire+0xae4/0xbb9
> [13440.783078] [<c023c43d>] lock_acquire+0x56/0x71
> [13440.787784] [<f899bff2>] ipoib_neigh_destructor+0xd0/0x132 [ib_ipoib]
> [13440.794412] [<c051ad41>] _spin_lock_irqsave+0x32/0x41
> [13440.799649] [<f899bff2>] ipoib_neigh_destructor+0xd0/0x132 [ib_ipoib]
> [13440.806275] [<f899bff2>] ipoib_neigh_destructor+0xd0/0x132 [ib_ipoib]
> [13440.812897] [<c04a1c1b>] dst_run_gc+0xc/0x118
> [13440.817439] [<c022af6e>] run_timer_softirq+0x37/0x16b
> [13440.822673] [<c04a1c0f>] dst_run_gc+0x0/0x118
> [13440.827221] [<c04a3eab>] neigh_destroy+0xbe/0x104
> [13440.832114] [<c04a1bb1>] dst_destroy+0x4d/0xab
> [13440.836751] [<c04a1c64>] dst_run_gc+0x55/0x118
> [13440.841384] [<c022b03f>] run_timer_softirq+0x108/0x16b
> [13440.846711] [<c0227634>] __do_softirq+0x5a/0xd5
> [13440.851427] [<c023b435>] trace_hardirqs_on+0x106/0x141
> [13440.856754] [<c0227643>] __do_softirq+0x69/0xd5
> [13440.861470] [<c02276e6>] do_softirq+0x37/0x4d
> [13440.866016] [<c02167b0>] smp_apic_timer_interrupt+0x6b/0x77
> [13440.871774] [<c02029ef>] default_idle+0x3b/0x54
> [13440.876491] [<c02029ef>] default_idle+0x3b/0x54
> [13440.881211] [<c0204c33>] apic_timer_interrupt+0x33/0x38
> [13440.886624] [<c02029ef>] default_idle+0x3b/0x54
> [13440.891342] [<c02029f1>] default_idle+0x3d/0x54
> [13440.896061] [<c0202aaa>] cpu_idle+0xa2/0xbb
> [13440.900436] =======================
> [13768.711447] BUG: spinlock lockup on CPU#1, swapper/0, c0687880
> [13768.717353] [<c031f919>] _raw_spin_lock+0xda/0xfd
> [13768.722247] [<c051ad48>] _spin_lock_irqsave+0x39/0x41
> [13768.727486] [<f899bff2>] ipoib_neigh_destructor+0xd0/0x132 [ib_ipoib]
> [13768.734110] [<f899bff2>] ipoib_neigh_destructor+0xd0/0x132 [ib_ipoib]
> [13768.740735] [<c04a1c1b>] dst_run_gc+0xc/0x118
> [13768.745276] [<c022af6e>] run_timer_softirq+0x37/0x16b
> [13768.750517] [<c04a1c0f>] dst_run_gc+0x0/0x118
> [13768.755061] [<c04a3eab>] neigh_destroy+0xbe/0x104
> [13768.759955] [<c04a1bb1>] dst_destroy+0x4d/0xab
> [13768.764586] [<c04a1c64>] dst_run_gc+0x55/0x118
> [13768.769218] [<c022b03f>] run_timer_softirq+0x108/0x16b
> [13768.774542] [<c0227634>] __do_softirq+0x5a/0xd5
> [13768.779261] [<c023b435>] trace_hardirqs_on+0x106/0x141
> [13768.784588] [<c0227643>] __do_softirq+0x69/0xd5
> [13768.789308] [<c02276e6>] do_softirq+0x37/0x4d
> [13768.793851] [<c02167b0>] smp_apic_timer_interrupt+0x6b/0x77
> [13768.799609] [<c02029ef>] default_idle+0x3b/0x54
> [13768.804326] [<c02029ef>] default_idle+0x3b/0x54
> [13768.809054] [<c0204c33>] apic_timer_interrupt+0x33/0x38
> [13768.814471] [<c02029ef>] default_idle+0x3b/0x54
> [13768.819187] [<c02029f1>] default_idle+0x3d/0x54
> [13768.823903] [<c0202aaa>] cpu_idle+0xa2/0xbb
> [13768.828279] =======================
>
>
> --
> MST
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
More information about the general
mailing list