[ofw] [PATCH] complib/user: fix timer race conditions

Tzachi Dar tzachid at mellanox.co.il
Sun Jun 27 01:51:08 PDT 2010


two more points:

[1] Not really related to this conversation, but timers work extremely better on windows server 2008 R2 than on 2003.
Don't try this on windows 2003, as it will never finish ...

[2] The following program creates a memory leak:

int TCB=0;

static void CALLBACK
__timer_callback( 
	IN void* const p_timer,
	IN BOOLEAN  timer_signalled )
{
    TCB++;
    if (!(TCB % 100000)) {
        printf("callback %d finished\n", TCB);
    }
}


VOID TestTimer() {
    const int tries=10000000;

    HANDLE h_timer;
    for (int i=0; i < tries;i++) {
        CreateTimerQueueTimer( &h_timer, NULL, __timer_callback, NULL, 10, 0, WT_EXECUTEINIOTHREAD );
    }
    printf("Finished creating cbs\n");
    while(TCB != tries);
}

That is we cannot run CreateTimerQueueTimer and forget the handle being created. We need to make sure that it is cleaned, otherwise we will have a very big memory leak on 2008 R2 and handle leacks on 2003.


We should probably do something like in the following program:

static void CALLBACK
__timer_callback( 
	IN void* const p_timer,
	IN BOOLEAN  timer_signalled )
{
    LONG l1 = InterlockedIncrement (&TCB);
    if (!(l1 % 10000)) {
        printf("callback %d finished\n", TCB);
    }
}


VOID TestTimer() {
    const int tries=3000000;
    int j = 0;
    BOOL ret;

    HANDLE *h_timer = new HANDLE[tries];
    for (int i=0; i < tries;i++) {
        CreateTimerQueueTimer( &h_timer[i], NULL, __timer_callback, NULL, 10, 0, WT_EXECUTEINIOTHREAD );
        while (j < TCB) {            
           ret =  DeleteTimerQueueTimer(NULL, h_timer[j],INVALID_HANDLE_VALUE);
           if (ret ==0) {
                printf("error in delete\n");
           }
            j++;
        }
    }
    printf("Finished creating cbs j = %d\n", j);

    while (j < TCB) {
        
       ret =  DeleteTimerQueueTimer(NULL, h_timer[j],INVALID_HANDLE_VALUE);
       if (ret ==0) {
            printf("error in delete\n");
       }
        j++;
    }


    while(TCB != tries);

    printf("Callback ended j = %d\n", j);
    delete []h_timer;

}

In other words we need to keep track for every timer request we start, and make sure it ends. This will make our design more complicated ...

Thanks
Tzachi


> -----Original Message-----
> From: ofw-bounces at lists.openfabrics.org [mailto:ofw-
> bounces at lists.openfabrics.org] On Behalf Of Tzachi Dar
> Sent: Saturday, June 26, 2010 10:59 PM
> To: Hefty, Sean; Smith, Stan; Fab Tillier; ofw at lists.openfabrics.org
> Cc: Uri Habusha; Yevgeny Kliteynik
> Subject: Re: [ofw] [PATCH] complib/user: fix timer race conditions
> 
> Thanks for your detailed answer, but I still have my concerns...
> 
> I have missed the fact that "NULL is passed into DeleteTimerQueueTimer
> rather than INVALID_HANDLE_VALUE.
> 
> This indeed removes the possibility of a deadlock in the case of start
> but still lives the other question:
> 
> How can I know how many callbacks are still there? That is, if I call
> start once, there will be only one callback.
> On some reasonable model, I'll not call start again until the callback
> happen.
> But if we want to allow to call start before the callback returns than
> I can't tell if I'll have one more call back, or two more.
> 
> This thing is even worse since stop should wait for all callbacks to
> return.
> Obviously in the current code, there is a chance that if the callback
> has started but not finished the function
> DeleteTimerQueueTimer( p_timer->h_timer_queue, timer,
> INVALID_HANDLE_VALUE );
> 
> Will never be called.
> 
> Please note that even if it will be called, it will only be called for
> the last callback and not for other callbacks that might be running.
> 
> I don't see how the second problem can be fixed here. That is, if one
> calls start 50 times, and 50 callbacks have started but not finished,
> there is no way to say that when they are finished...
> 
> Thanks
> Tzachi
> 
> > -----Original Message-----
> > From: Hefty, Sean [mailto:sean.hefty at intel.com]
> > Sent: Friday, June 25, 2010 6:08 PM
> > To: Tzachi Dar; Smith, Stan; Fab Tillier; ofw at lists.openfabrics.org
> > Cc: Uri Habusha; Yevgeny Kliteynik
> > Subject: RE: [PATCH] complib/user: fix timer race conditions
> >
> > > The following lines are very problematic:
> > >
> > > 	EnterCriticalSection( &p_timer->cb_lock );
> > > 	(p_timer->pfn_callback)( (void*)p_timer->context );
> > > 	LeaveCriticalSection( &p_timer->cb_lock );
> > >
> > > Nothing specific with timer, but if you have a code that takes a
> > lock, and
> > > call someone else code than this is asking for trouble. The reason
> is
> > that
> > > if he has done something like this (take a lock and call you back)
> > than
> > > there is a deadlock. This can be avoided by documentation, but I
> > would
> > > rather avoid this issues...
> >
> > This locking is fine.  The cb_lock lock is specifically only used
> here
> > and not in other calls.  There is no chance of deadlock coming from
> the
> > use of this lock.
> >
> > > One thing that I'm missing here is that start might not do anything
> > but
> > > still return success. I would like to have two more return code,
> one
> > for
> > > the case of timer not running because the time is too big, and one
> > because
> > > another callback is running. This is to allow the user to know if
> he
> > should
> > > wait for a callback or not.
> >
> > If start returns success, then the timer will run.
> >
> > > Another issue here is that start() might end waiting for the
> callback
> > to
> > > return. Again this is a very problematic behavior (for a general
> > timer)
> > > since two of this can cerate a deadlock. This is especially true
> for
> > using
> > > process io threads that one has no control over. That is
> > documentation
> > > should say, either don't call start when there is already something
> > > running, or don't call timer.start() from a system thread. This is
> > because
> > > two system threads might create a deadlock.
> >
> > Start and trim never wait for a callback to return.  That's why NULL
> is
> > passed into DeleteTimerQueueTimer rather than INVALID_HANDLE_VALUE.
> > NULL is non-blocking, INVALID_HANDLE_VALUE is blocking.
> >
> > > Also on start one takes the critical section and calls
> > >
> > > 		if( p_timer->h_timer )
> > > 			DeleteTimerQueueTimer( p_timer->h_timer_queue,
> > p_timer-
> > > >h_timer, NULL );
> > > If the callback has already started running but haven't yet called
> > > __clear_timer_handle() there is a deadlock here. As one is holding
> > the
> > > spinlock waiting for the other to finish running , while the other
> is
> > > trying to capture the spinlock.
> >
> > This is why there are two locks.  One lock is used to synchronize the
> > state, the other lock to serialize the callbacks.
> >
> > > So, the general rule should be don't call start on this timer from
> io
> > > threads...
> >
> > Start can be called from any thread.
> >
> 
> _______________________________________________
> ofw mailing list
> ofw at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw



More information about the ofw mailing list