[openib-general] Re: [PATCH] cm refcount race fix

Michael S. Tsirkin mst at mellanox.co.il
Mon May 8 14:43:37 PDT 2006


Quoting r. Roland Dreier <rdreier at cisco.com>:
> Subject: Re: [PATCH] cm refcount race fix
> 
> It does seem we can simplify mthca_cq in a slightly different way.
> mthca_cq_clean() doesn't need to take a CQ reference, because we know
> the CQ can't go away before all associated QPs are gone, and at least
> one QP will stay around until mthca_cq_clean() returns.
> 
> So the below patch is both a fix and a decent cleanup:
> 
> --- infiniband/hw/mthca/mthca_provider.h	(revision 6945)
> +++ infiniband/hw/mthca/mthca_provider.h	(working copy)
> @@ -197,7 +197,7 @@ struct mthca_cq_resize {
>  struct mthca_cq {
>  	struct ib_cq		ibcq;
>  	spinlock_t		lock;
> -	atomic_t		refcount;
> +	int			refcount;
>  	int			cqn;
>  	u32			cons_index;
>  	struct mthca_cq_buf	buf;
> --- infiniband/hw/mthca/mthca_dev.h	(revision 6945)
> +++ infiniband/hw/mthca/mthca_dev.h	(working copy)
> @@ -496,7 +496,7 @@ void mthca_free_cq(struct mthca_dev *dev
>  void mthca_cq_completion(struct mthca_dev *dev, u32 cqn);
>  void mthca_cq_event(struct mthca_dev *dev, u32 cqn,
>  		    enum ib_event_type event_type);
> -void mthca_cq_clean(struct mthca_dev *dev, u32 cqn, u32 qpn,
> +void mthca_cq_clean(struct mthca_dev *dev, struct mthca_cq *cq, u32 qpn,
>  		    struct mthca_srq *srq);
>  void mthca_cq_resize_copy_cqes(struct mthca_cq *cq);
>  int mthca_alloc_cq_buf(struct mthca_dev *dev, struct mthca_cq_buf *buf, int nent);
> --- infiniband/hw/mthca/mthca_cq.c	(revision 6945)
> +++ infiniband/hw/mthca/mthca_cq.c	(working copy)
> @@ -234,14 +234,19 @@ void mthca_cq_event(struct mthca_dev *de
>  {
>  	struct mthca_cq *cq;
>  	struct ib_event event;
> +	unsigned long flags;
>  
> -	spin_lock(&dev->cq_table.lock);
> +	spin_lock_irqsave(&dev->cq_table.lock, flags);
>  
>  	cq = mthca_array_get(&dev->cq_table.cq, cqn & (dev->limits.num_cqs - 1));
>  
> -	if (cq)
> -		atomic_inc(&cq->refcount);
> -	spin_unlock(&dev->cq_table.lock);
> +	if (cq) {
> +		spin_lock(&cq->lock);
> +		++cq->refcount;
> +		spin_unlock(&cq->lock);
> +	}
> +
> +	spin_unlock_irqrestore(&dev->cq_table.lock, flags);


Hmm. I see you take cq->lock inside cq_table.lock

OTOH in mthca_qp.c we have:

        spin_lock_irq(&send_cq->lock);
        if (send_cq != recv_cq)
                spin_lock(&recv_cq->lock);

        spin_lock(&dev->qp_table.lock);

So qp_table.lock is taken inside cq->lock.

I can't prove its a problem, but locking rules are getting confusing -
it was better when all table locks where inner-most.

As a solution, we can decide that cq recount is protected by cq_table.lock.
what do you say.


-- 
MST



More information about the general mailing list