[ewg] [PATCH] IPoIB: synchronize timer deletion with completion handler

Eli Cohen eli at dev.mellanox.co.il
Thu Dec 10 03:37:02 PST 2009


On Wed, Dec 09, 2009 at 02:39:49PM -0800, Roland Dreier wrote:
> 
>  > When calling del_timer_sync on priv->poll_timer, it is necessary to prevent
>  > farther arming of the timer which can be done by a completion handler. Though
>  > it is harmless since the timer will eventually stop being rearmed, it is better
>  > practice to avoid calling the timer function after it is deleted. This patch
>  > handles this by using a new flag that is checked before arming the timer.
> 
> have you seen this in practice?  If it can happen then it's not
> harmless, since the module could be unloaded with the timer pending.
> However I don't see how it could happen, since we only seem to delete
> the timer after we know that no more completions are coming (except for
> the case where we decide that the hardware is wedged but it really only
> takes a *long* time to respond at exactly the wrong time, and we somehow
> get a completion between the del_timer_sync and the modify QP to reset
> state -- which is so unlikely it seems not worth adding this extra
> complexity for -- maybe we could add the del_timer_sync to after we
> delete the CQ or something if you're really worried)

No, I haven't actually seen this happening but I think it could.
Consider this scenario:

1. ipoib_send() calls ib_req_notify_cq()
2. ipoib_ib_dev_stop() gets called
3. All pending WR complete; interrupt is generated but not yet
serviced.
4. ipoib_ib_dev_stop() calls del_timer_sync() and passes it.
5. Completion handler is finally serviced - it schedules
ipoib_ib_tx_timer_func()
6. IPoIB unloads and we're in trouble.

Not so likely to happen indeed. Your suggestion to defer deleting the
timer until after the CQ is destroyed will also solve this.



More information about the ewg mailing list