[ofa-general] Re: [PATCHv4] IB/ipoib: Fix ipoib handling for pkey reordering
Michael S. Tsirkin
mst at dev.mellanox.co.il
Thu Mar 29 08:07:32 PDT 2007
> Quoting Moni Levy <myopenib at gmail.com>:
> Subject: [PATCHv4] IB/ipoib: Fix ipoib handling for pkey reordering
>
> This issue was found during partitioning & SM fail over testing. The fix was
> tested over the weekend with pkey reshuffling, removal and addition every few
> seconds concurrent with OFED restart.
But probably not together with the patch below, right?
I just came up with the proposal yesterday, it seems unlikely
the patch could be tested over the weekend ...
> Please look at the "IB/cache: Add
> ib_cache report for cache in process" patch also.
>
> Changes from v1:
> * added flush flag to ipoib_ib_dev_stop(), ipoib_ib_dev_down() alike
> * fixed a bug in device extraction from the work struct
> * removed some warnings in case they are caused due to missing PKEY as
> this seems like a valid flow now.
Here's an idea:
Instead of adding yet another flag to ipoib_ib_dev_stop and friends, and
worrying about potential races when ipoib_ib_dev_stop is run from both ipoib
workqueue and another thread, how about always making them *not* flush, and
using a queue + flush combination when they need to be run not in ipoib work
queue?
Roland, what do you think?
> @@ -232,9 +232,10 @@ static int ipoib_mcast_join_finish(struc
> ret = ipoib_mcast_attach(dev, be16_to_cpu(mcast->mcmember.mlid),
> &mcast->mcmember.mgid);
> if (ret < 0) {
> - ipoib_warn(priv, "couldn't attach QP to multicast group "
> - IPOIB_GID_FMT "\n",
> - IPOIB_GID_ARG(mcast->mcmember.mgid));
> + if (ret != -ENXIO) /* No pkey found */
> + ipoib_warn(priv, "couldn't attach QP to multicast group "
> + IPOIB_GID_FMT "\n",
> + IPOIB_GID_ARG(mcast->mcmember.mgid));
>
> clear_bit(IPOIB_MCAST_FLAG_ATTACHED, &mcast->flags);
> return ret;
I forgot why are we checking for this ENXIO error - isn't this
because cache updates where out of sync with port events?
So maybe we can get rid of this now?
BTW, shouldn't there be some code testing return code for -ESTALE
and retrying later? What am I missing?
--
MST
More information about the general
mailing list