[ofa-general] [RFC] ipoib: avoid using stale ipoib_neigh* in ipoib_neigh_cleanup()

Yossi Etigin yossi.openib at gmail.com
Fri May 22 14:13:11 PDT 2009


akepner at sgi.com wrote:
> diff -rup c/ofa_kernel-1.3/drivers/infiniband/ulp/ipoib/ipoib_main.c d/ofa_kernel-1.3/drivers/infiniband/ulp/ipoib/ipoib_main.c
> --- c/ofa_kernel-1.3/drivers/infiniband/ulp/ipoib/ipoib_main.c  2008-04-22 13:25:23.131563415 -0700
> +++ d/ofa_kernel-1.3/drivers/infiniband/ulp/ipoib/ipoib_main.c  2008-04-22 15:24:31.475721847 -0700
> @@ -821,11 +821,15 @@ static void ipoib_neigh_cleanup(struct n
>         unsigned long flags;
>         struct ipoib_ah *ah = NULL;
> 
> +       spin_lock_irqsave(&priv->lock, flags);
>         neigh = *to_ipoib_neigh(n);
> +       spin_unlock_irqrestore(&priv->lock, flags);
>         if (neigh) {
> -               priv = netdev_priv(neigh->dev);
> -               ipoib_dbg(priv, "neigh_destructor for bonding device: %s\n",
> -                         n->dev->name);
> +               if (priv != netdev_priv(neigh->dev)) {
> +                       ipoib_dbg(priv, "neigh_destructor for bonding device: "
> +                               "%s\n", n->dev->name);
> +                       priv = netdev_priv(neigh->dev);
> +               }
>         } else
>                 return;
>         ipoib_dbg(priv,

Now I see that the patch that caused the deadlock is a little more that
moving spin_lock_irqsave() a few lines up in the code..
 The code above looks a little suspicious. The spin_lock_irqsave() above
looks redundant - someone could kfree the neigh after you release the lock
and you get a corrupted `priv'.

Besides, I see that in the 1.3.1 code there is a test 
'if (n->dev->type != ARPHRD_INFINIBAND)', check this out:
http://www.mail-archive.com/general@lists.openfabrics.org/msg00839.html

--Yossi



More information about the general mailing list