[ofa-general] Re: [NEW PATCH] IB/mad: Fix possible lock-lock-timer deadlock

Bart Van Assche bart.vanassche at gmail.com
Tue Sep 8 10:01:42 PDT 2009


On Tue, Sep 8, 2009 at 8:25 AM, Bart Van Assche
<bart.vanassche at gmail.com> wrote:
> On Tue, Sep 8, 2009 at 6:21 AM, Roland Dreier<rdreier at cisco.com> wrote:
> >  > With 2.6.31-rc9 + patch 4e49627b9bc29a14b393c480e8c979e3bc922ef7 + the
> >  > patch you posted at the start of this thread the following lockdep
> >  > complaint was triggered on the SRP initiator system during SRP login:
> >  >
> >  > ======================================================
> >  > [ INFO: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected ]
> >  > 2.6.31-rc9 #2
> >  > ------------------------------------------------------
> >  > ibsrpdm/4290 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
> >  >  (&(&rmpp_recv->cleanup_work)->timer){+.-...}, at:
> >  > [<ffffffff802559f0>] del_timer_sync+0x0/0xa0
> >  >
> >  > and this task is already holding:
> >  >  (&mad_agent_priv->lock){..-...}, at: [<ffffffffa03c6de8>]
> >  > ib_cancel_rmpp_recvs+0x28/0x118 [ib_mad]
> >  > which would create a new lock dependency:
> >  >  (&mad_agent_priv->lock){..-...} -> (&(&rmpp_recv->cleanup_work)->timer){+.-...}
> >
> > And this report doesn't happen with the older patch?  (Did you do the
> > same testing with the older patch that triggered this)
> >
> > Because this looks like a *different* incarnation of the same
> > lock->lock->delayed work/timer that we're trying to fix here -- the
> > delayed work is now rmpp_recv->cleanup_work in this case instead of
> > mad_agent_priv->timed_work as it was before.
>
> The above issue does not occur with the for-next branch of the
> infiniband git tree, but does occur with 2.6.31-rc9 + aforementioned
> patches.
>
> As far as I can see commit 721d67cdca5b7642b380ca0584de8dceecf6102f
> (http://git.kernel.org/?p=linux/kernel/git/roland/infiniband.git;a=commitdiff;h=721d67cdca5b7642b380ca0584de8dceecf6102f)
> is not yet included in 2.6.31-rc9. Could this be related to the above
> issue ?

Update: patch 721d67cdca5b7642b380ca0584de8dceecf6102f does not apply
cleanly to 2.6.31-rc9, so I have been using a slightly modified
version of this patch
(http://bugzilla.kernel.org/attachment.cgi?id=22624).

I have retested the 2.6.31-rc9 kernel with the following patches applied to it:
* patch 4e49627b9bc29a14b393c480e8c979e3bc922ef7
* http://bugzilla.kernel.org/attachment.cgi?id=22624
* the patch posted at the start of this thread.

With this combination I did not observe any lockdep complaints.

Bart.



More information about the general mailing list