[ofa-general] Re: [PATCHv3 for-2.6.21] IB/mthca: fix race in QP destroy

Michael S. Tsirkin mst at mellanox.co.il
Sat Mar 3 13:51:50 PST 2007


> Quoting Roland Dreier <rdreier at cisco.com>:
> Subject: Re: [PATCHv3 for-2.6.21] IB/mthca: fix race in QP destroy
> 
>  > > I'm not quite sure I understand why we have to synchronize against the
>  > > completion EQ's interrupt here.
>  > 
>  > Hmm, I'm not sure myself, now.
>  > I'm still thinking about this - the patch below is clearly correct
>  > and seems sufficient to fix the issue pointed out by bugzilla.
>  > So let's get it merged and I'll try to think about and address
>  > other isses (if any) in a separate patch.
> 
> The more I think about it, the more I think that synchronizing against
> the completion interrupt doesn't accomplish anything.  The completion
> interrupt itself only looks at the CQ, so it doesn't matter what we do
> with the QP table or anything to do with QPs.  And a consumer could
> poll any CQ at any time, in or out of interrupt context, so we're not
> protecting against anything that has to do with polling CQs.
> 
> However, it does seem that we should also clean the CQs before
> removing the QP from the table, to avoid polling completions for a QP
> not in the QP table.  
> 
> And also synchronizing with the async event EQ's interrupts still
> makes sense to me.
> 
> I guess I don't quite understand why this change is enough to fix bug
> #394 -- it seems it is just changing the timing without really closing
> the race window completely.

With current code, when we destroy a QP, we remove it from table first,
and move QP to reset. This is clearly wrong, and this patch fixes this.

To fix the issue completely, the simplest approach is to use the same
EQ for completion and async events and for command interface.
I plan to send such a patch next week.


-- 
MST



More information about the general mailing list