[ofw] [PATCH] complib/user: fix timer race conditions
Hefty, Sean
sean.hefty at intel.com
Fri Jun 25 08:08:12 PDT 2010
> The following lines are very problematic:
>
> EnterCriticalSection( &p_timer->cb_lock );
> (p_timer->pfn_callback)( (void*)p_timer->context );
> LeaveCriticalSection( &p_timer->cb_lock );
>
> Nothing specific with timer, but if you have a code that takes a lock, and
> call someone else code than this is asking for trouble. The reason is that
> if he has done something like this (take a lock and call you back) than
> there is a deadlock. This can be avoided by documentation, but I would
> rather avoid this issues...
This locking is fine. The cb_lock lock is specifically only used here and not in other calls. There is no chance of deadlock coming from the use of this lock.
> One thing that I'm missing here is that start might not do anything but
> still return success. I would like to have two more return code, one for
> the case of timer not running because the time is too big, and one because
> another callback is running. This is to allow the user to know if he should
> wait for a callback or not.
If start returns success, then the timer will run.
> Another issue here is that start() might end waiting for the callback to
> return. Again this is a very problematic behavior (for a general timer)
> since two of this can cerate a deadlock. This is especially true for using
> process io threads that one has no control over. That is documentation
> should say, either don't call start when there is already something
> running, or don't call timer.start() from a system thread. This is because
> two system threads might create a deadlock.
Start and trim never wait for a callback to return. That's why NULL is passed into DeleteTimerQueueTimer rather than INVALID_HANDLE_VALUE. NULL is non-blocking, INVALID_HANDLE_VALUE is blocking.
> Also on start one takes the critical section and calls
>
> if( p_timer->h_timer )
> DeleteTimerQueueTimer( p_timer->h_timer_queue, p_timer-
> >h_timer, NULL );
> If the callback has already started running but haven't yet called
> __clear_timer_handle() there is a deadlock here. As one is holding the
> spinlock waiting for the other to finish running , while the other is
> trying to capture the spinlock.
This is why there are two locks. One lock is used to synchronize the state, the other lock to serialize the callbacks.
> So, the general rule should be don't call start on this timer from io
> threads...
Start can be called from any thread.
More information about the ofw
mailing list