[openib-general] [PATCH fixed] was Re: [PATCH] ipoib_multicast/ipoib_mcast_send race

Krishna Kumar2 krkumar2 at in.ibm.com
Sun Dec 11 21:40:12 PST 2005


Hi Micheal,

But the lock doesn't help that case. The only difference with having the
lock is that in case of a race, the mcast_send() will complete *before*
the flag is set, while without the lock the mcast_send() could be in the
*middle* of execution when the flag is set. But in both cases, the packet
would be sent out. In case of reverse race (that means the stop_thread
executes before the mcast_send(), in both cases, the packet would not
be sent out, hence the lock is not helping both cases.

I feel the new code looks fine without the lock.

Thanks,

- KK

"Michael S. Tsirkin" <mst at mellanox.co.il> wrote on 12/09/2005 06:41:51 PM:

> The lock around clear_bit is there to ensure that ipoib_mcast_send isnt 
running
> already when we stop the thread.
> Thats why test_bit has to be instide the lock, too.
> 
> 
> Quoting r. Krishna Kumar2 <krkumar2 at in.ibm.com>:
> > Subject: Re: [openib-general] [PATCH fixed] was Re: [PATCH]?
> ipoib_multicast/ipoib_mcast_send race
> > 
> > 
> > Hi Micheal, 
> > 
> > Is there a reason to have the atomic set_bit() within a lock (even for 

> > a race condition of stop vs send, it doesn't seem to be required) ? 
> > Which means the test_bit() can also be put before the existing lock... 

> > 
> > Thanks, 
> > 
> > - KK 
> > 
> > openib-general-bounces at openib.org wrote on 12/09/2005 12:04:06 AM:
> > 
> > > Quoting Michael S. Tsirkin <mst at mellanox.co.il>:
> > > > Subject: [PATCH] ipoib_multicast/ipoib_mcast_send race
> > > > 
> > > > Hello, Roland!
> > > > Here's another race scenario.
> > > > 
> > > > ---
> > > > 
> > > > Fix the following race scenario:
> > > > device is up.
> > > > port event or set mcast list triggers ipoib_mcast_stop_thread, 
> > > > This cancels the query and waits on mcast "done" completion.
> > > > completion is called and "done" is set.
> > > > Meanwhile, ipoib_mcast_send arrives and starts a new query,
> > > > re-initializing "done".
> > > > 
> > > > Signed-off-by: Michael S. Tsirkin <mst at mellanox.co.il>
> > > 
> > > The patch I posted previously leaked an skb when a multicast
> > > send arrived while the mcast thread is stopped.
> > > 
> > > Further, there's an additional issue that I saw in testing:
> > > ipoib_mcast_send may get called when priv->broadcast is NULL
> > > (e.g. if the device was downed and then upped internally because
> > > of a port event).
> > > If this happends and the sendonly join request gets completed before
> > > priv->broadcast is set, we get an oops that I posted previously.
> > > 
> > > Here's a better patch to address these two problems.
> > > It has been running fine here for a while now.
> > > 
> > > Please note that this replaces the ipoib_multicast/ipoib_mcast_send
> > patch,
> > > but not the ADMIN_UP patch that I posted previously.
> > > 
> > > ---
> > > 
> > > Do not send multicasts if mcast thread is stopped or if
> > > priv->broadcast is not set.
> > > 
> > > Signed-off-by: Michael S. Tsirkin <mst at mellanox.co.il>
> > > 
> > > Index: openib/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> > > ===================================================================
> > > --- openib/drivers/infiniband/ulp/ipoib/ipoib_multicast.c (revision
> > 4222)
> > > +++ openib/drivers/infiniband/ulp/ipoib/ipoib_multicast.c   (working
> > copy)
> > > @@ -582,6 +582,10 @@ int ipoib_mcast_start_thread(struct net_
> > >        queue_work(ipoib_workqueue, &priv->mcast_task);
> > >     up(&mcast_mutex);
> > > 
> > > +   spin_lock_irq(&priv->lock);
> > > +   set_bit(IPOIB_MCAST_STARTED, &priv->flags);
> > > +   spin_unlock_irq(&priv->lock);
> > > +
> > >     return 0;
> > >  }
> > > 
> > > @@ -592,6 +596,10 @@ int ipoib_mcast_stop_thread(struct net_d
> > > 
> > >     ipoib_dbg_mcast(priv, "stopping multicast thread\n");
> > > 
> > > +   spin_lock_irq(&priv->lock);
> > > +   clear_bit(IPOIB_MCAST_STARTED, &priv->flags);
> > > +   spin_unlock_irq(&priv->lock);
> > > +
> > >     down(&mcast_mutex);
> > >     clear_bit(IPOIB_MCAST_RUN, &priv->flags);
> > >     cancel_delayed_work(&priv->mcast_task);
> > > @@ -674,6 +682,11 @@ void ipoib_mcast_send(struct net_device 
> > >      */
> > >     spin_lock(&priv->lock);
> > > 
> > > +   if (!test_bit(IPOIB_MCAST_STARTED, &priv->flags) ||
> > !priv->broadcast) {
> > > +      dev_kfree_skb_any(skb);
> > > +      goto unlock;
> > > +   }
> > > +
> > >     mcast = __ipoib_mcast_find(dev, mgid);
> > >     if (!mcast) {
> > >        /* Let's create a new send only group now */
> > > @@ -732,6 +745,7 @@ out:
> > >        ipoib_send(dev, skb, mcast->ah, IB_MULTICAST_QPN);
> > >     }
> > > 
> > > +unlock:
> > >     spin_unlock(&priv->lock);
> > >  }
> > > 
> > > Index: openib/drivers/infiniband/ulp/ipoib/ipoib.h
> > > ===================================================================
> > > --- openib/drivers/infiniband/ulp/ipoib/ipoib.h   (revision 4222)
> > > +++ openib/drivers/infiniband/ulp/ipoib/ipoib.h   (working copy)
> > > @@ -78,6 +78,7 @@ enum {
> > >     IPOIB_FLAG_SUBINTERFACE   = 4,
> > >     IPOIB_MCAST_RUN      = 5,
> > >     IPOIB_STOP_REAPER         = 6,
> > > +   IPOIB_MCAST_STARTED       = 7,
> > > 
> > >     IPOIB_MAX_BACKOFF_SECONDS = 16,
> > > 
> > > 
> > > -- 
> > > MST
> > > _______________________________________________
> > > openib-general mailing list
> > > openib-general at openib.org
> > > http://openib.org/mailman/listinfo/openib-general
> > > 
> > > To unsubscribe, please visit
> > http://openib.org/mailman/listinfo/openib-general
> > 
> 
> -- 
> MST
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20051212/963041e0/attachment.html>


More information about the general mailing list