[ofw] patch: Fix a race in the cl_timer code that caused deadlocks in opensm
Smith, Stan
stan.smith at intel.com
Wed Jun 23 22:28:01 PDT 2010
Hefty, Sean wrote:
>> Forgot to mention, you should use the cb_serialize lock to protect
>> setting the thread ID too.
>
> The thread ID needs to be protected using 'spinlock', since that lock
> is held when reading it later. Or maybe cb_serialize would work, as
> long as thread_id is cleared after the callback returns...
>
> Actually, I think setting thread_id = 0 is required, since its
> purpose is to see if timer_stop is being called from the callback.
Here's one for the I wonder why crowd....
If thread_id is set to zero under the cb_serialize lock, immediately after the return from the timer callback pfn_callback(), when compute nodes reboot they do not get IPv4 addresses by DHCP?
Remove just the p_timer->thread_id = 0 after the callback and DHCP starts assigning addresses upon reboot? Who would have thought?
Also, DAPL tests hang when cl_timer callbacks are serialized.
I suspect there has been a long standing bug that was never noticed.....sigh.
Perhaps opensm should have it's own user-mode implementation of cl_timer ?
More information about the ofw
mailing list