[ofw] patch: Fix a race in the cl_timer code that caused deadlocks in opensm

Fab Tillier ftillier at microsoft.com
Thu Jun 24 11:50:18 PDT 2010


Hefty, Sean wrote on Thu, 24 Jun 2010 at 11:46:53

>> One thing to keep in mind is that with the lock to serialize the
>> callback threads, you may block threadpool threads that need to run
>> something else, so the deadlock might be a side effect of serializing
>> the callbacks with a lock.
> 
> Okay - the user space implementation of cl_timer uses some system
> thread pool, which defaults to a maximum of 500 threads.  Deadlock may
> still be happening at a higher level, but it doesn't seem likely that
> it would come from the cl_timer implementation.

Just because the default max is 500 doesn't mean that you aren't tripping things up with a heuristic that says you only really should have two...

In any case, I think I have a fix for serializing the callbacks without holding locks, see my other mail for this.

-Fab



More information about the ofw mailing list