[ofa-general] [PATCH] ipoib: racing uses of ipoib_neigh in IPoIB-CM
akepner at sgi.com
akepner at sgi.com
Thu Jun 11 09:33:22 PDT 2009
On Wed, Jun 10, 2009 at 10:52:29PM +0300, Yossi Etigin wrote:
> akepner at sgi.com wrote:
> > @@ -810,9 +810,6 @@ void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ib_wc *wc)
> > list_del(&neigh->list);
> > if (neigh->ah)
> > ipoib_put_ah(neigh->ah);
> > - ipoib_neigh_free(dev, neigh);
> > -
> > - tx->neigh = NULL;
> > }
> >
> > if (test_and_clear_bit(IPOIB_FLAG_INITIALIZED, &tx->flags)) {
>
> Can't calling ipoib_put_ah() and keeping the neighbour alive cause neigh->ah
> pointer become invalid? What about doing list_del(&neigh->list) twice?
>
Yeah, looks like there's a hole there. By changing the test from:
if (neigh) {
neigh->cm = NULL;
to:
if (neigh && neigh->cm) {
neigh->cm = NULL;
it can be closed.
> ......
>
> The piece of code that does '*to_ipoib_neigh(neigh->neighbour) = NULL' and
> releases skb's will appear 3 times in ipoib. I think it can be put in a
> separate function.
OK.
>
>
> > @@ -1343,7 +1351,23 @@ static void ipoib_cm_tx_reap(struct work_struct *work)
> > spin_lock_irqsave(&priv->lock, flags);
> >
> > while (!list_empty(&priv->cm.reap_list)) {
> > + struct ipoib_neigh *neigh;
> > + struct sk_buff *skb;
> > p = list_entry(priv->cm.reap_list.next, typeof(*p), list);
> > + neigh = p->neigh;
> > + if (neigh) {
> > + *to_ipoib_neigh(neigh->neighbour) = NULL;
> > + ipoib_dbg(priv, "%s: releasing ref to %pI6 "
> > + "(refcnt: %d)\n", __func__, neigh->dgid.raw,
> > + atomic_read(&neigh->neighbour->refcnt));
> > + neigh_release(neigh->neighbour);
> > + while ((skb = __skb_dequeue(&neigh->queue))) {
> > + ++dev->stats.tx_dropped;
> > + dev_kfree_skb_any(skb);
> > + }
> > + kfree(neigh);
> > + }
> > + p->neigh = NULL;
> > list_del(&p->list);
> > spin_unlock_irqrestore(&priv->lock, flags);
> > netif_tx_unlock_bh(dev);
>
> Couldn't this race with ipoib_neigh_cleanup()?
>
Generally the fact that we hold a reference should prevent
ipoib_neigh_cleanup() from being invoked.
But, alas, there are a couple of cases where the neighbour can
be have the neigh_cleanup() method invoked even when there are
references held (e.g., when the interface goes down).
So, yeah, there seems to be a hole there. A smaller hole, but
still....
--
Arthur
More information about the general
mailing list