[openib-general] Re: race in mthca_cq.c?

Michael S. Tsirkin mst at mellanox.co.il
Thu Jun 8 15:06:49 PDT 2006


Quoting r. Roland Dreier <rdreier at cisco.com>:
> Subject: Re: race in mthca_cq.c?
> 
>     Michael> Not in the driver I have: mthca_array_clear is at line
>     Michael> 1351, mthca_cq_clean at line 1372.  Isn't
>     Michael> mthca_array_clear freeing the slot in QP table?
> 
> Nope, the bitmap slot isn't freed until mthca_free().

Oh. Right. I see it now.

>     Michael> But there might be more EQEs for this CQN outstanding in
>     Michael> the EQ which we have not seen yet.
> 
> Now that you mention it, that could be a real problem I guess.
> synchronize_irq() isn't enough because the interrupt handler might not
> have even started yet.
> 
> But on the other hand a CQ can't be destroyed until after all
> associated QPs have been destroyed.  So could we really miss EQEs for
> that long?

Yes, I think there might be spurious EQEs and they might get delayed
in HW for a long time. Destroyng QPs does not flush completion events out.

So just this bit?

--

Check EQE is not for a stale CQ number.  Since high bits in CQ number are
allocated by round-robin, we can be reasonably sure CQ number is different even
for CQs which share slot in CQ table.

Signed-off-by: Michael S. Tsirkin <mst at mellanox.co.il>


--- openib/drivers/infiniband/hw/mthca/mthca_cq.c	2006-05-09 21:07:28.623383000 +0300
+++ /mswg/work/mst/tmp/infiniband1/hw/mthca/mthca_cq.c	2006-06-08 23:46:52.404499000 +0300
@@ -217,9 +217,9 @@ void mthca_cq_completion(struct mthca_de
 {
 	struct mthca_cq *cq;
 
 	cq = mthca_array_get(&dev->cq_table.cq, cqn & (dev->limits.num_cqs - 1));
 
-	if (!cq) {
+	if (!cq || cq->cqn != cqn) {
 		mthca_warn(dev, "Completion event for bogus CQ %08x\n", cqn);
 		return;
 	}

-- 
MST




More information about the general mailing list