[ofa-general] Re: [ewg] OFED April 21 meeting summary
Roland Dreier
rdreier at cisco.com
Mon Apr 28 16:07:35 PDT 2008
> I just saw this bug report today, but we've had similar crashes.
> Looks like the problem is that in ipoib_neigh_cleanup() this is
> done (no locking):
>
> neigh = *to_ipoib_neigh(n);
>
> then later:
>
> spin_lock_irqsave(&priv->lock, flags);
> if (neigh->ah)
> ah = neigh->ah;
> list_del(&neigh->list); <---- neigh may be stale now
> ipoib_neigh_free(n->dev, neigh);
> spin_unlock_irqrestore(&priv->lock, flags);
>
> neigh wasn't re-read after acquiring the lock, so it may point
> to an already freed data structure.
Ugh, looks delicate to fix properly, since we don't have a lock to take
until we find out whether the neighbour is attached to an IPoIB device.
> Unable to handle kernel paging request at 0000000000100108
> ^^^^^^^^^^^^^^^^
> LIST_POISON1 + 0x8
strange that the ofa bugzilla entry has a different address it's
crashing at.
More information about the ewg
mailing list