[openib-general] Re: [PATCH] cm refcount race fix
Michael S. Tsirkin
mst at mellanox.co.il
Mon May 8 14:43:37 PDT 2006
Quoting r. Roland Dreier <rdreier at cisco.com>:
> Subject: Re: [PATCH] cm refcount race fix
>
> It does seem we can simplify mthca_cq in a slightly different way.
> mthca_cq_clean() doesn't need to take a CQ reference, because we know
> the CQ can't go away before all associated QPs are gone, and at least
> one QP will stay around until mthca_cq_clean() returns.
>
> So the below patch is both a fix and a decent cleanup:
>
> --- infiniband/hw/mthca/mthca_provider.h (revision 6945)
> +++ infiniband/hw/mthca/mthca_provider.h (working copy)
> @@ -197,7 +197,7 @@ struct mthca_cq_resize {
> struct mthca_cq {
> struct ib_cq ibcq;
> spinlock_t lock;
> - atomic_t refcount;
> + int refcount;
> int cqn;
> u32 cons_index;
> struct mthca_cq_buf buf;
> --- infiniband/hw/mthca/mthca_dev.h (revision 6945)
> +++ infiniband/hw/mthca/mthca_dev.h (working copy)
> @@ -496,7 +496,7 @@ void mthca_free_cq(struct mthca_dev *dev
> void mthca_cq_completion(struct mthca_dev *dev, u32 cqn);
> void mthca_cq_event(struct mthca_dev *dev, u32 cqn,
> enum ib_event_type event_type);
> -void mthca_cq_clean(struct mthca_dev *dev, u32 cqn, u32 qpn,
> +void mthca_cq_clean(struct mthca_dev *dev, struct mthca_cq *cq, u32 qpn,
> struct mthca_srq *srq);
> void mthca_cq_resize_copy_cqes(struct mthca_cq *cq);
> int mthca_alloc_cq_buf(struct mthca_dev *dev, struct mthca_cq_buf *buf, int nent);
> --- infiniband/hw/mthca/mthca_cq.c (revision 6945)
> +++ infiniband/hw/mthca/mthca_cq.c (working copy)
> @@ -234,14 +234,19 @@ void mthca_cq_event(struct mthca_dev *de
> {
> struct mthca_cq *cq;
> struct ib_event event;
> + unsigned long flags;
>
> - spin_lock(&dev->cq_table.lock);
> + spin_lock_irqsave(&dev->cq_table.lock, flags);
>
> cq = mthca_array_get(&dev->cq_table.cq, cqn & (dev->limits.num_cqs - 1));
>
> - if (cq)
> - atomic_inc(&cq->refcount);
> - spin_unlock(&dev->cq_table.lock);
> + if (cq) {
> + spin_lock(&cq->lock);
> + ++cq->refcount;
> + spin_unlock(&cq->lock);
> + }
> +
> + spin_unlock_irqrestore(&dev->cq_table.lock, flags);
Hmm. I see you take cq->lock inside cq_table.lock
OTOH in mthca_qp.c we have:
spin_lock_irq(&send_cq->lock);
if (send_cq != recv_cq)
spin_lock(&recv_cq->lock);
spin_lock(&dev->qp_table.lock);
So qp_table.lock is taken inside cq->lock.
I can't prove its a problem, but locking rules are getting confusing -
it was better when all table locks where inner-most.
As a solution, we can decide that cq recount is protected by cq_table.lock.
what do you say.
--
MST
More information about the general
mailing list