[ofa-general] Re: [PATCHv3 for-2.6.21] IB/mthca: fix race in QPdestroy

Michael S. Tsirkin mst at mellanox.co.il
Sat Mar 10 22:09:35 PST 2007


> Quoting Roland Dreier <rdreier at cisco.com>:
> Subject: Re: [PATCHv3 for-2.6.21] IB/mthca: fix race in QPdestroy
> 
>  >     It is common practice to put a pointer/index to a per-QP
>  >     structure inside the wrid. This data is available after poll_cq
>  >     returns, when cq lock is not taken.  If this pointer is used
>  >     directly inside the event handler, the ULP that is moving QP to
>  >     reset has no way to know when is it safe to free data it points to,
>  >     unless the verbs provider synchronizes with the IRQ handler
>  >     before the verbs returns.
> 
> Does this fix any problem for in-tree (or OFED) drivers?

Yes. Example:
IPoIB UD (ipoib_ib_dev_stop) has an assumption that WRs will complete within
several seconds. This might not be the case.  It should modify QP to reset
instead and that should guarantee no more completions will be polled.

> Because I'm
> not convinced this is something that a low-level driver should try to
> handle.  A ULP that suffers from this that polls a CQ from a
> workqueue, say, rather than an interrupt remains broken even after
> this change.

Yes but such a ULP can flush the workqueue before freeing any memory.
ULP can't do this for callbacks coming from low level driver.

> And my gut feeling is that this type of problem is
> something a ULP should handle by not getting into this situation in
> the first place.

To handle this, ULP would need a capability to sync with the
completion IRQ, which is lacking in our API.

> BTW, have you had a chance to test the other changes and see if they
> fix the IPoIB CM issue?

There were some failures that seem unrelated.
Hope to let you know soon.

-- 
MST



More information about the general mailing list