[ofa-general] Re: Kernel panic in IPoIB stability testing

Moni Shoua monis at Voltaire.COM
Wed Feb 4 05:33:38 PST 2009


Yossi Etigin wrote:
> I think it comes from unicast_arp_send.
> 
> Consider this scenario:
> - paths are flushed (opensm up/down).
> - unicast_arp_send() is called with a path in priv->path_list.
> path->valid is 0.
> - path_rec_start() fails with -EAGAIN (-11) because alloc_mad() fails -
> no sm ah (yet)
>  (see the prints just before the panic).
> - unicast_arp_send calls() path_free().
> - path memory is overwritten.
> - __ipoib_dev_flush() is called again.
> - mark_paths_invalid() tries to iterate over priv->path_list and gets
> kernel panic
>  because path->list became invalid.
> 
> --Yossi
> 
I agree with Yossi's analysis.
Isn't the fix just as simple as this?

void ipoib_mark_paths_invalid(struct net_device *dev)
{
        struct ipoib_dev_priv *priv = netdev_priv(dev);
        struct ipoib_path *path, *tp;

        spin_lock_irq(&priv->lock);

        list_for_each_entry_safe(path, tp, &priv->path_list, list) {
                ipoib_dbg(priv, "mark path LID 0x%04x GID " IPOIB_GID_FMT " invalid\n",
                        be16_to_cpu(path->pathrec.dlid),
                        IPOIB_GID_ARG(path->pathrec.dgid));
-                path->valid =  0;
+                if (path)
+			path->valid =  0;
        }

        spin_unlock_irq(&priv->lock);
}




More information about the general mailing list