[ewg] Re: pkey change handling patch (was Re: bugs to fix for OFED 1.2 RC1)

Wed Mar 28 02:00:57 PDT 2007

On 3/27/07, Michael S. Tsirkin <mst at dev.mellanox.co.il> wrote:
> > Changed from v3
> >       * added a flush_scheduled_work call before we restart the QP in order
> >         to ensure that the pkey table we read from the cache is updated
>
>
> +void ipoib_ib_dev_restart_qp(struct work_struct *work)
> +{
> +       struct ipoib_dev_priv *priv =
> +               container_of(work, struct ipoib_dev_priv, restart_qp_task);
> +       /* We only restart the QP in case of pkey change event */
> +       ipoib_dbg(priv, "Flushing %s and restarting it's QP\n", priv->dev->name);
> +       /* Ensures the pkey table we read  from the cache is updated properly */
> +       flush_scheduled_work();
> +       __ipoib_ib_dev_flush(priv, 1);
> +}
> +
>
> I think doing flush_scheduled_work from inside the ipoib workqueue
> can trigger deadlocks - which deadlocks the workqueue was
> created to avoid, in the first place. Look at the comment
> in ipoib_main.c where the WQ is created.

        /*
         * We create our own workqueue mainly because we want to be
         * able to flush it when devices are being removed.  We can't
         * use schedule_work()/flush_scheduled_work() because both
         * unregister_netdev() and linkwatch_event take the rtnl lock,
         * so flush_scheduled_work() can deadlock during device
         * removal.
         */

I read that few times and I understand it as : ipoib workqueue was
added because if the default system workqueue was used
unregister_netdev() and linkwatch_event() would deadlock. Am I wrong
at assuming that after we added the ipoib workqueue we can call
flush_scheduled_work ?

>
> And, I don't think that depending on the fact that the cache
> uses a default schedule queue internally is such a good idea.

You're definitely right. The coherency enforcement should be inside
the cache implementation. We can move the call to
flush_scheduled_work() inside the ib_find_cached_pkey and
ib_get_cached_pkey. What do you think ?

>
> How about simply requeueing the work again if the cache query failed?

The problem is that it does not fail. It returns a non coherent
result, which just don't reflect the pkey table change.

-- Moni

>
> --
> MST
>