[openib-general] ipoib lockdep warning

Zach Brown zach.brown at oracle.com
Tue Jul 11 16:27:10 PDT 2006


Roland Dreier wrote:
> No time to really look at this in detail, but I think the issue is a
> slightly bogus conversion to netif_tx_lock().  Can you try this patch
> and see if things are better?

> -	local_irq_save(flags);
>  	netif_tx_lock(dev);
> -	spin_lock(&priv->lock);
> +	spin_lock_irqsave(&priv->lock, flags);

Hmm, won't that hold dev->_xmit_lock with interrupts enabled?  That
seems like it'd deadlock with dev_watchdog grabbing it from a softirq?

But setting that aside, I don't think this will help as it doesn't
change the order that the locks are acquired in this path.  This can
still be racing with other cpus that are each in the other two paths
(queue_idr.lock -> softirq -> dev->_xmit_lock, priv->lock ->
queue_idr.lock).  The local state of interrupt masking on this cpu
doesn't stop the other cpus from each grabbing the first lock in their
path and then trying to grab the second.  Imagine that they all race to
grab the first (A, B, C) and succeed and then all get stuck spinning on
their second lock (B, C, A).

Maybe you could get the priv->lock here before dev->_xmit_lock.  Then
we'd have AB, BC, AC, and that's OK.  I'm going to guess that this won't
work because other paths have dev->_xmit_lock -> priv->lock ordering.

Another possibility is masking interrupts when getting queue_idr.lock.
That drops the implicit dependency between queue_idr and _xmit_lock and
gives us AB, B, CA -- which is fine.  That means blocking ints while in
idr_pre_get() which allocs which leads to GFP_ATOMIC and more likely
allocation failure.

That's my reading, anyway.

- z




More information about the general mailing list