[ofw] [PATCH] complib/user: fix timer race conditions

Tzachi Dar tzachid at mellanox.co.il
Sat Jun 26 12:59:27 PDT 2010


Thanks for your detailed answer, but I still have my concerns...

I have missed the fact that "NULL is passed into DeleteTimerQueueTimer rather than INVALID_HANDLE_VALUE.

This indeed removes the possibility of a deadlock in the case of start but still lives the other question:

How can I know how many callbacks are still there? That is, if I call start once, there will be only one callback.
On some reasonable model, I'll not call start again until the callback happen.
But if we want to allow to call start before the callback returns than I can't tell if I'll have one more call back, or two more.

This thing is even worse since stop should wait for all callbacks to return.
Obviously in the current code, there is a chance that if the callback has started but not finished the function 
DeleteTimerQueueTimer( p_timer->h_timer_queue, timer, INVALID_HANDLE_VALUE );

Will never be called. 

Please note that even if it will be called, it will only be called for the last callback and not for other callbacks that might be running.

I don't see how the second problem can be fixed here. That is, if one calls start 50 times, and 50 callbacks have started but not finished, there is no way to say that when they are finished...

Thanks
Tzachi

> -----Original Message-----
> From: Hefty, Sean [mailto:sean.hefty at intel.com]
> Sent: Friday, June 25, 2010 6:08 PM
> To: Tzachi Dar; Smith, Stan; Fab Tillier; ofw at lists.openfabrics.org
> Cc: Uri Habusha; Yevgeny Kliteynik
> Subject: RE: [PATCH] complib/user: fix timer race conditions
> 
> > The following lines are very problematic:
> >
> > 	EnterCriticalSection( &p_timer->cb_lock );
> > 	(p_timer->pfn_callback)( (void*)p_timer->context );
> > 	LeaveCriticalSection( &p_timer->cb_lock );
> >
> > Nothing specific with timer, but if you have a code that takes a
> lock, and
> > call someone else code than this is asking for trouble. The reason is
> that
> > if he has done something like this (take a lock and call you back)
> than
> > there is a deadlock. This can be avoided by documentation, but I
> would
> > rather avoid this issues...
> 
> This locking is fine.  The cb_lock lock is specifically only used here
> and not in other calls.  There is no chance of deadlock coming from the
> use of this lock.
> 
> > One thing that I'm missing here is that start might not do anything
> but
> > still return success. I would like to have two more return code, one
> for
> > the case of timer not running because the time is too big, and one
> because
> > another callback is running. This is to allow the user to know if he
> should
> > wait for a callback or not.
> 
> If start returns success, then the timer will run.
> 
> > Another issue here is that start() might end waiting for the callback
> to
> > return. Again this is a very problematic behavior (for a general
> timer)
> > since two of this can cerate a deadlock. This is especially true for
> using
> > process io threads that one has no control over. That is
> documentation
> > should say, either don't call start when there is already something
> > running, or don't call timer.start() from a system thread. This is
> because
> > two system threads might create a deadlock.
> 
> Start and trim never wait for a callback to return.  That's why NULL is
> passed into DeleteTimerQueueTimer rather than INVALID_HANDLE_VALUE.
> NULL is non-blocking, INVALID_HANDLE_VALUE is blocking.
> 
> > Also on start one takes the critical section and calls
> >
> > 		if( p_timer->h_timer )
> > 			DeleteTimerQueueTimer( p_timer->h_timer_queue,
> p_timer-
> > >h_timer, NULL );
> > If the callback has already started running but haven't yet called
> > __clear_timer_handle() there is a deadlock here. As one is holding
> the
> > spinlock waiting for the other to finish running , while the other is
> > trying to capture the spinlock.
> 
> This is why there are two locks.  One lock is used to synchronize the
> state, the other lock to serialize the callbacks.
> 
> > So, the general rule should be don't call start on this timer from io
> > threads...
> 
> Start can be called from any thread.
> 




More information about the ofw mailing list