[ofa-general] Re: [ewg] OFED April 21 meeting summary
akepner at sgi.com
akepner at sgi.com
Mon Apr 28 09:37:31 PDT 2008
On Mon, Apr 28, 2008 at 07:14:39PM +0300, Olga Shern (Voltaire) wrote:
> ...
> https://bugs.openfabrics.org/show_bug.cgi?id=985 we will try to reproduce
> it on upstream kernel and let you know
I just saw this bug report today, but we've had similar crashes.
Looks like the problem is that in ipoib_neigh_cleanup() this is
done (no locking):
neigh = *to_ipoib_neigh(n);
then later:
spin_lock_irqsave(&priv->lock, flags);
if (neigh->ah)
ah = neigh->ah;
list_del(&neigh->list); <---- neigh may be stale now
ipoib_neigh_free(n->dev, neigh);
spin_unlock_irqrestore(&priv->lock, flags);
neigh wasn't re-read after acquiring the lock, so it may point
to an already freed data structure.
In our crashes we had backtraces like:
RIP: ib_ipoib:ipoib_neigh_cleanup+368
neigh_destroy+197
neigh_periodic_timer+249
neigh_periodic_timer+0
run_timer_softirq+348
__do_softirq+85
call_softirq+30
do_softirq+44
.....
And the following helpful hint:
Unable to handle kernel paging request at 0000000000100108
^^^^^^^^^^^^^^^^
LIST_POISON1 + 0x8
So we were dying in the midst of list_del().
--
Arthur
More information about the ewg
mailing list