[ewg] Re: pkey change handling patch (was Re: bugs to fix for OFED 1.2 RC1)

Wed Mar 28 02:33:45 PDT 2007

> Quoting Moni Levy <monil at voltaire.com>:
> Subject: Re: pkey change handling patch (was Re: bugs to fix for OFED 1.2 RC1)
> 
> On 3/27/07, Michael S. Tsirkin <mst at dev.mellanox.co.il> wrote:
> >> Changed from v3
> >>       * added a flush_scheduled_work call before we restart the QP in 
> >order
> >>         to ensure that the pkey table we read from the cache is updated
> >
> >
> >+void ipoib_ib_dev_restart_qp(struct work_struct *work)
> >+{
> >+       struct ipoib_dev_priv *priv =
> >+               container_of(work, struct ipoib_dev_priv, restart_qp_task);
> >+       /* We only restart the QP in case of pkey change event */
> >+       ipoib_dbg(priv, "Flushing %s and restarting it's QP\n", 
> >priv->dev->name);
> >+       /* Ensures the pkey table we read  from the cache is updated 
> >properly */
> >+       flush_scheduled_work();
> >+       __ipoib_ib_dev_flush(priv, 1);
> >+}
> >+
> >
> >I think doing flush_scheduled_work from inside the ipoib workqueue
> >can trigger deadlocks - which deadlocks the workqueue was
> >created to avoid, in the first place. Look at the comment
> >in ipoib_main.c where the WQ is created.
> 
>        /*
>         * We create our own workqueue mainly because we want to be
>         * able to flush it when devices are being removed.  We can't
>         * use schedule_work()/flush_scheduled_work() because both
>         * unregister_netdev() and linkwatch_event take the rtnl lock,
>         * so flush_scheduled_work() can deadlock during device
>         * removal.
>         */
> 
> 
> I read that few times and I understand it as : ipoib workqueue was
> added because if the default system workqueue was used
> unregister_netdev() and linkwatch_event() would deadlock.

Yes. What you are doing is blocking ipoib workqueue until
system workqueue is flushed. So now flushing the ipoib workqueue
would deadlock.

> Am I wrong
> at assuming that after we added the ipoib workqueue we can call
> flush_scheduled_work ?

Yes :)
It's not enough to add the workqueue - you must also use it.

> >
> >And, I don't think that depending on the fact that the cache
> >uses a default schedule queue internally is such a good idea.
> 
> You're definitely right. The coherency enforcement should be inside
> the cache implementation.
> 
> We can move the call to
> flush_scheduled_work() inside the ib_find_cached_pkey and
> ib_get_cached_pkey. What do you think ?

I think the current rule is that it is legal to call these
in atomic context so we can't block there.

Further, since with your patch ib_find_cached_pkey is called from
ipoib workqueue, the deadlock would still stay, I think.

> >
> >How about simply requeueing the work again if the cache query failed?
> 
> The problem is that it does not fail. It returns a non coherent
> result, which just don't reflect the pkey table change.

I looked at cache.c and you are right. Maybe we should either
1. report events after cache has been updated
or
2. make cache queries error out (EBUSY?) if cache hs not updated yet.

Option 1 requires core changes, option 2 - ULP changes

I would be inclined to go for 2.
Roland?

-- 
MST