[openib-general] Re: race in mthca_cq.c?
Michael S. Tsirkin
mst at mellanox.co.il
Thu Jun 8 15:06:49 PDT 2006
Quoting r. Roland Dreier <rdreier at cisco.com>:
> Subject: Re: race in mthca_cq.c?
>
> Michael> Not in the driver I have: mthca_array_clear is at line
> Michael> 1351, mthca_cq_clean at line 1372. Isn't
> Michael> mthca_array_clear freeing the slot in QP table?
>
> Nope, the bitmap slot isn't freed until mthca_free().
Oh. Right. I see it now.
> Michael> But there might be more EQEs for this CQN outstanding in
> Michael> the EQ which we have not seen yet.
>
> Now that you mention it, that could be a real problem I guess.
> synchronize_irq() isn't enough because the interrupt handler might not
> have even started yet.
>
> But on the other hand a CQ can't be destroyed until after all
> associated QPs have been destroyed. So could we really miss EQEs for
> that long?
Yes, I think there might be spurious EQEs and they might get delayed
in HW for a long time. Destroyng QPs does not flush completion events out.
So just this bit?
--
Check EQE is not for a stale CQ number. Since high bits in CQ number are
allocated by round-robin, we can be reasonably sure CQ number is different even
for CQs which share slot in CQ table.
Signed-off-by: Michael S. Tsirkin <mst at mellanox.co.il>
--- openib/drivers/infiniband/hw/mthca/mthca_cq.c 2006-05-09 21:07:28.623383000 +0300
+++ /mswg/work/mst/tmp/infiniband1/hw/mthca/mthca_cq.c 2006-06-08 23:46:52.404499000 +0300
@@ -217,9 +217,9 @@ void mthca_cq_completion(struct mthca_de
{
struct mthca_cq *cq;
cq = mthca_array_get(&dev->cq_table.cq, cqn & (dev->limits.num_cqs - 1));
- if (!cq) {
+ if (!cq || cq->cqn != cqn) {
mthca_warn(dev, "Completion event for bogus CQ %08x\n", cqn);
return;
}
--
MST
More information about the general
mailing list