[ewg] Re: pkey change handling patch (was Re: bugs to fix for OFED 1.2 RC1)

Wed Mar 28 05:48:52 PDT 2007

On 3/28/07, Michael S. Tsirkin <mst at dev.mellanox.co.il> wrote:
> > Quoting Moni Levy <monil at voltaire.com>:
> > Subject: Re: pkey change handling patch (was Re: bugs to fix for OFED 1.2 RC1)
> >
> > On 3/27/07, Michael S. Tsirkin <mst at dev.mellanox.co.il> wrote:
> > >> Changed from v3
> > >>       * added a flush_scheduled_work call before we restart the QP in
> > >order
> > >>         to ensure that the pkey table we read from the cache is updated
> > >
> > >
> > >+void ipoib_ib_dev_restart_qp(struct work_struct *work)
> > >+{
> > >+       struct ipoib_dev_priv *priv =
> > >+               container_of(work, struct ipoib_dev_priv, restart_qp_task);
> > >+       /* We only restart the QP in case of pkey change event */
> > >+       ipoib_dbg(priv, "Flushing %s and restarting it's QP\n",
> > >priv->dev->name);
> > >+       /* Ensures the pkey table we read  from the cache is updated
> > >properly */
> > >+       flush_scheduled_work();
> > >+       __ipoib_ib_dev_flush(priv, 1);
> > >+}
> > >+
> > >
> > >I think doing flush_scheduled_work from inside the ipoib workqueue
> > >can trigger deadlocks - which deadlocks the workqueue was
> > >created to avoid, in the first place. Look at the comment
> > >in ipoib_main.c where the WQ is created.
> >
> >        /*
> >         * We create our own workqueue mainly because we want to be
> >         * able to flush it when devices are being removed.  We can't
> >         * use schedule_work()/flush_scheduled_work() because both
> >         * unregister_netdev() and linkwatch_event take the rtnl lock,
> >         * so flush_scheduled_work() can deadlock during device
> >         * removal.
> >         */
> >
> >
> > I read that few times and I understand it as : ipoib workqueue was
> > added because if the default system workqueue was used
> > unregister_netdev() and linkwatch_event() would deadlock.
>
> Yes. What you are doing is blocking ipoib workqueue until
> system workqueue is flushed. So now flushing the ipoib workqueue
> would deadlock.
>
> > Am I wrong
> > at assuming that after we added the ipoib workqueue we can call
> > flush_scheduled_work ?
>
> Yes :)
> It's not enough to add the workqueue - you must also use it.
>
> > >
> > >And, I don't think that depending on the fact that the cache
> > >uses a default schedule queue internally is such a good idea.
> >
> > You're definitely right. The coherency enforcement should be inside
> > the cache implementation.
> >
> > We can move the call to
> > flush_scheduled_work() inside the ib_find_cached_pkey and
> > ib_get_cached_pkey. What do you think ?
>
> I think the current rule is that it is legal to call these
> in atomic context so we can't block there.

Right

>
> Further, since with your patch ib_find_cached_pkey is called from
> ipoib workqueue, the deadlock would still stay, I think.

Ok

>
> > >
> > >How about simply requeueing the work again if the cache query failed?
> >
> > The problem is that it does not fail. It returns a non coherent
> > result, which just don't reflect the pkey table change.
>
> I looked at cache.c and you are right. Maybe we should either
> 1. report events after cache has been updated

This one should block if not ready right ? That won't work with the
atomic context calling.

> or
> 2. make cache queries error out (EBUSY?) if cache hs not updated yet.
>

That sounds like a good idea. The problem is that we then should
update all the consumers of these two calls with the additional return
code handling, although it sounds like not a very big change.

-- Moni

> Option 1 requires core changes, option 2 - ULP changes
>
> I would be inclined to go for 2.
>
> Roland?
>
> --
> MST
>