[ofa-general] Re: Kernel panic in IPoIB stability testing
Moni Shoua
monis at Voltaire.COM
Wed Feb 4 05:33:38 PST 2009
Yossi Etigin wrote:
> I think it comes from unicast_arp_send.
>
> Consider this scenario:
> - paths are flushed (opensm up/down).
> - unicast_arp_send() is called with a path in priv->path_list.
> path->valid is 0.
> - path_rec_start() fails with -EAGAIN (-11) because alloc_mad() fails -
> no sm ah (yet)
> (see the prints just before the panic).
> - unicast_arp_send calls() path_free().
> - path memory is overwritten.
> - __ipoib_dev_flush() is called again.
> - mark_paths_invalid() tries to iterate over priv->path_list and gets
> kernel panic
> because path->list became invalid.
>
> --Yossi
>
I agree with Yossi's analysis.
Isn't the fix just as simple as this?
void ipoib_mark_paths_invalid(struct net_device *dev)
{
struct ipoib_dev_priv *priv = netdev_priv(dev);
struct ipoib_path *path, *tp;
spin_lock_irq(&priv->lock);
list_for_each_entry_safe(path, tp, &priv->path_list, list) {
ipoib_dbg(priv, "mark path LID 0x%04x GID " IPOIB_GID_FMT " invalid\n",
be16_to_cpu(path->pathrec.dlid),
IPOIB_GID_ARG(path->pathrec.dgid));
- path->valid = 0;
+ if (path)
+ path->valid = 0;
}
spin_unlock_irq(&priv->lock);
}
More information about the general
mailing list