[ofw] patch: Fix a race in the cl_timer code that caused deadlocks in opensm

Tue Jun 22 08:45:27 PDT 2010

> This should promise that if a callback is running, p_timer->thread_id  will
> have the thread_id of the running thread.
> 
> This field is later used when cl_timer_stop() is being called in order to
> prevent deadlocks. (please note that cl_timer_stop() is being called from
> cl_timer_start(). Please also note that the deadlock happens if the caller
> of cl_stop() is actually executing in the context of the call back itself).

I was just looking at the cl_timer code yesterday for this exact same reason.  Good catch. 

Maybe this should be rethought entirely and simply not allow certain timer manipulation from the callback, or require the caller to provide the synchronization.  The callback can return a value to indicate if the timer should stop.

Also, are we sure that the kernel doesn't have the same problem?

> The fix itself is to sync the start and stop calls and now only one
> callback will be running (actually, there is one exception to this code: if
> from the callback, a new timer is started, the new callback might be
> running on the same time, but: (1) stop has already being called. (2) the
> p_timer->thread_id is not being touched after the callback.

A single timer allows multiple, simultaneous callbacks?  If so, this seems wrong.

>  static void CALLBACK
> 
>  __timer_callback(
> 
>                 IN cl_timer_t* const p_timer,
> 
>                 IN BOOLEAN timer_signalled )
> 
>  {
> 
>                 /* timer_signalled is always TRUE, and has no value. */
> 
> +
> 
> +             DWORD thread_id = GetCurrentThreadId();
> 
>                 CL_ASSERT( timer_signalled );
> 
> +
> 
>                 UNUSED_PARAM( timer_signalled );
> 
> 
> 
> +             CL_ASSERT(thread_id!=0);
> 
> +
> 
>                 p_timer->timeout_time = 0;
> 
> -              p_timer->thread_id = GetCurrentThreadId();
> 
> +             p_timer->thread_id = thread_id;
>

In a free build, I would fully expect the compiler to optimize the local variable thread_id away.