[ofa-general] Re: 2.6.30.1: possible irq lock inversion dependency detected

Bart Van Assche bart.vanassche at gmail.com
Fri Aug 7 02:58:11 PDT 2009


On Thu, Aug 6, 2009 at 7:56 PM, Roland Dreier<rdreier at cisco.com> wrote:
>
>  > After having applied this patch it took somewhat longer before a
>  > locking inversion report was generated, but unfortunately there still
>  > was a locking inversion report generated (see also
>  > http://bugzilla.kernel.org/show_bug.cgi?id=13757 for the details):
>
> ummm, yikes...
>
> can you apply the hack patch I sent originally to take priv->lock from
> an interrupt ASAP and try that along with the fix patch to drop
> priv->lock before calling ipoib_send()?  That might make the lockdep
> trace understandable.

The lockdep report I obtained this morning with a 2.6.30.4 kernel and
the two patches applied has been attached to the kernel bugzilla
entry. This lockdep report was generated while testing the SRPT target
software. I have double checked that the SRPT target implementation
does not hold any spinlocks or mutexes while calling functions in the
IB core. This means that the SRPT target code cannot have caused any
of the reported lock cycles.

By the way, I noticed that while many subsystems in the Linux kernel
use event queues to report information to higher software layers, that
the IB core makes extensive use of callback functions. The combination
of nested locking and callback functions can easily lead to lock
inversion. This effect is well known in the operating system world --
see e.g. the talk by John Ousterhout about multithreaded versus
event-driven software (http://home.pacbell.net/ouster/threads.pdf,
1996).

=========================================================
[ INFO: possible irq lock inversion dependency detected ]
2.6.30.4-scst-debug #2
---------------------------------------------------------
[ ... ]
stack backtrace:
Pid: 26040, comm: cc1 Not tainted 2.6.30.4-scst-debug #2
Call Trace:
 <IRQ>  [<ffffffff80272bec>] print_irq_inversion_bug+0x14c/0x1c0
 [<ffffffff80272cdd>] check_usage_forwards+0x7d/0xc0
 [<ffffffff80271faf>] mark_lock+0x20f/0x6a0
 [<ffffffff80272c60>] ? check_usage_forwards+0x0/0xc0
 [<ffffffff802743e4>] __lock_acquire+0xce4/0x1c80
 [<ffffffff802713bd>] ? trace_hardirqs_off+0xd/0x10
 [<ffffffff80249305>] ? release_console_sem+0x1e5/0x230
 [<ffffffff80249919>] ? vprintk+0x2e9/0x480
 [<ffffffff80275488>] lock_acquire+0x108/0x150
 [<ffffffffa043f5a2>] ? ib_cm_notify+0x102/0x2c0 [ib_cm]
 [<ffffffff80515371>] _spin_lock_irqsave+0x41/0x60
 [<ffffffffa043f5a2>] ? ib_cm_notify+0x102/0x2c0 [ib_cm]
 [<ffffffffa043f5a2>] ib_cm_notify+0x102/0x2c0 [ib_cm]
 [<ffffffffa06a6e1e>] srpt_qp_event+0x4e/0x140 [ib_srpt]
 [<ffffffffa02656aa>] mlx4_ib_qp_event+0x7a/0xf0 [mlx4_ib]
 [<ffffffffa04c5e0f>] mlx4_qp_event+0x6f/0xe0 [mlx4_core]
 [<ffffffffa04bd659>] mlx4_eq_int+0x289/0x2e0 [mlx4_core]
 [<ffffffffa04bd79a>] mlx4_msi_x_interrupt+0x6a/0x90 [mlx4_core]
 [<ffffffff8028bf35>] handle_IRQ_event+0x95/0x200
 [<ffffffff8028e3d8>] handle_edge_irq+0xc8/0x170
 [<ffffffff8020eeef>] handle_irq+0x1f/0x30
 [<ffffffff8020e5fe>] do_IRQ+0x6e/0xf0
 [<ffffffff8020c913>] ret_from_intr+0x0/0xf
 <EOI> <6>

Bart.



More information about the general mailing list