[ofa-general] Re: pkey change handling patch (was Re: bugs to fix for OFED 1.2 RC1)
Moni Levy
monil at voltaire.com
Wed Mar 28 02:00:57 PDT 2007
On 3/27/07, Michael S. Tsirkin <mst at dev.mellanox.co.il> wrote:
> > Changed from v3
> > * added a flush_scheduled_work call before we restart the QP in order
> > to ensure that the pkey table we read from the cache is updated
>
>
> +void ipoib_ib_dev_restart_qp(struct work_struct *work)
> +{
> + struct ipoib_dev_priv *priv =
> + container_of(work, struct ipoib_dev_priv, restart_qp_task);
> + /* We only restart the QP in case of pkey change event */
> + ipoib_dbg(priv, "Flushing %s and restarting it's QP\n", priv->dev->name);
> + /* Ensures the pkey table we read from the cache is updated properly */
> + flush_scheduled_work();
> + __ipoib_ib_dev_flush(priv, 1);
> +}
> +
>
> I think doing flush_scheduled_work from inside the ipoib workqueue
> can trigger deadlocks - which deadlocks the workqueue was
> created to avoid, in the first place. Look at the comment
> in ipoib_main.c where the WQ is created.
/*
* We create our own workqueue mainly because we want to be
* able to flush it when devices are being removed. We can't
* use schedule_work()/flush_scheduled_work() because both
* unregister_netdev() and linkwatch_event take the rtnl lock,
* so flush_scheduled_work() can deadlock during device
* removal.
*/
I read that few times and I understand it as : ipoib workqueue was
added because if the default system workqueue was used
unregister_netdev() and linkwatch_event() would deadlock. Am I wrong
at assuming that after we added the ipoib workqueue we can call
flush_scheduled_work ?
>
> And, I don't think that depending on the fact that the cache
> uses a default schedule queue internally is such a good idea.
You're definitely right. The coherency enforcement should be inside
the cache implementation. We can move the call to
flush_scheduled_work() inside the ib_find_cached_pkey and
ib_get_cached_pkey. What do you think ?
>
> How about simply requeueing the work again if the cache query failed?
The problem is that it does not fail. It returns a non coherent
result, which just don't reflect the pkey table change.
-- Moni
>
> --
> MST
>
More information about the general
mailing list