[ofw] patch: Fix a race in the cl_timer code that caused deadlocks in opensm

Fab Tillier ftillier at microsoft.com
Wed Jun 23 11:50:10 PDT 2010


Tzachi Dar wrote on Wed, 23 Jun 2010 at 01:06:56

> We need to decide what our demands from a timer are. The minimum we
> need to answer IMHO is this:
> 
> 1) Do we want to allow simultaneous callbacks? For one timer? For more
> than one timer?

I would think we would want each timer to have at most one callback running.  Having multiple callbacks for multiple timers should be fine (they are independent timers.)

> 2) Do we want to allow timer start to fall (because of no resources?)?
> If yes, design is simpler.

Timer design might be simpler, but client usage is more complicated.  That said, I think at least in user mode, you must allow start to fail.

> 3) Do we want start to be blocking sometimes (as it is today).

Start only blocks today due to the internal call to cl_timer_stop.  I think this was done to allow start to be called multiple times.  This is probably a feature we don't need (it should be a programming error to call start when the timer is already started.

> 4) Do we want to allow start in the callback? Stop in the callback?

I think both are fine.  I think the implementation could be smarter about what it does, though - for example, the timer could delay starting the timer until the callback unwinds, thus preventing the callback from executing multiple times.  I'll noodle this over for a bit.

> Without answering all of this it is very hard to say if something is a
> feature or a bug.

Yes, we need to have a clear expectation of how the timer behaves.  It seems that the current issue is rooted in the fact that opensm made assumptions about timers based on the Linux implementation, and that the Windows timer implementation is *much* more relaxed in its operation.

> One simple set of answers to this question is this:
> Use a single thread for all timers, start does not fail, start not
> blocking, start/ can be called at the callback, stop cannot called at
> the callback. This is what Linux have implemented, and we can do a
> similar thing.

For things like OpenSM, it may make sense to more closely mimic the Linux behavior.  Going forward, I would expect the core drivers to move away from complib and towards the native Windows constructs for these things where possible.

-Fab



More information about the ofw mailing list