[ofa-general] Re: Kernel panic in IPoIB stability testing
Jack Morgenstein
jackm at dev.mellanox.co.il
Wed Feb 4 05:45:22 PST 2009
On Wednesday 04 February 2009 15:33, Moni Shoua wrote:
> Isn't the fix just as simple as this?
>
> void ipoib_mark_paths_invalid(struct net_device *dev)
> {
> struct ipoib_dev_priv *priv = netdev_priv(dev);
> struct ipoib_path *path, *tp;
>
> spin_lock_irq(&priv->lock);
>
> list_for_each_entry_safe(path, tp, &priv->path_list, list) {
> ipoib_dbg(priv, "mark path LID 0x%04x GID " IPOIB_GID_FMT " invalid\n",
> be16_to_cpu(path->pathrec.dlid),
> IPOIB_GID_ARG(path->pathrec.dgid));
> - path->valid = 0;
> + if (path)
> + path->valid = 0;
> }
>
> spin_unlock_irq(&priv->lock);
> }
>
I doubt it. You are leaving a deleted path record as part of the path list.
This is list corruption (since the list pointers themselves are part of the
path record structure -- what if this returned storage is re-allocated?).
I think the correct fix (after your previous posted comment) is:
path = __path_find(dev, phdr->hwaddr + 4);
if (!path || !path->valid) {
int had_path = 0;
if (!path)
path = path_rec_create(dev, phdr->hwaddr + 4);
else
had_path = 1;
if (path) {
/* put pseudoheader back on for next time */
skb_push(skb, sizeof *phdr);
__skb_queue_tail(&path->queue, skb);
if (path_rec_start(dev, path)) {
if (had_path) {
list_del(&path->list);
rb_erase(&path->rb_node,
&priv->path_tree);
}
spin_unlock_irqrestore(&priv->lock, flags);
path_free(dev, path);
return;
} else
__path_add(dev, path);
} else {
++dev->stats.tx_dropped;
dev_kfree_skb_any(skb);
}
spin_unlock_irqrestore(&priv->lock, flags);
return;
}
My only question here is:
Do we have to worry about netif_tx_lock_bh(dev) (as taken in ipoib_flush_paths)?
(If we do, we have a problem).
- Jack
More information about the general
mailing list