[ofw] patch: Fix a race in the cl_timer code that caused deadlocks in opensm

Smith, Stan stan.smith at intel.com
Thu Jun 24 11:11:23 PDT 2010


Hefty, Sean wrote:
> Stan, can you repost just the user space changes to cl_timer?
>
>>> If thread_id is set to zero under the cb_serialize lock, immediately
>>> after the return from the timer callback pfn_callback(), when
>>> compute nodes reboot they do not get IPv4 addresses by DHCP?
>>> Remove just the p_timer->thread_id = 0 after the callback and DHCP
>>> starts assigning addresses upon reboot? Who would have thought?
>>
>> This is only the user-mode changes?  If you back those changes out
>> on only the node running OpenSM, do things work?  That is, is it a
>> problem with OpenSM not bringing things up properly?
>
> I gave this more thought, and I'm convinced that we need thread_id
> set as this:
>
> lock cb_serialize
> thread_id = GetID()
> callback()
> thread_id = 0
> unlock cb_serialize

The above code sequence in user mode cl_timer.c fails DHCP address assignment upon compute node reboot.
HCA ports are 'ACTIVE' but no DHCP assignment. Kernel cl_timer V2 patches installed.

If you go back to Tzachi's patch, then you get DHCP address assignment correctly .
thread_id = GetThreadId();
lock cb_serialize
callback()
unlock cb_serialize

Currently building/testing without the lock/unlock cb_serialize.

Will also test with

thread_id = GetThreadId();
lock cb_serialize
callback()
thread_id = 0
unlock cb_serialize

stay tuned.

>
> if we want cl_timer_stop to block.  Without this, the checks in
> cl_timer_stop/start won't work correctly.  Personally, I don't see
> any reason to keep the check in cl_timer_stop to see if we're calling
> from the callback.  Nothing that I see actually does this anyway, and
> it's just adding more complexity to the code.
>
>>> Also, DAPL tests hang when cl_timer callbacks are serialized. I
>>> suspect there has been a long standing bug that was never
>>> noticed.....sigh.
>>
>> Nothing wrong with uncovering bugs.  I believe that's called
>> progress. :)

Turns out the DAPL test hangs are node count dependent. 4 nodes OK, 11 nodes OK, 21 nodes fail.
After the 21 node failure, 11 & 4 nodes now fail....sigh.


>>
>>> Perhaps opensm should have it's own user-mode implementation of
>>> cl_timer ?
>
> I thought about this, and it may be necessary if we need to serialize
> all timers.  But I still believe that the other cl_timer
> implementation needs to be fixed.
>
>> I think we all agree that the multiple callback model for timers is
>> broken - each timer should have at most one callback executing at a
>> time.  So we need to change cl_timer both in user and kernel to at
>> least fix that.  If other code ends up broken, that's a side effect.
>
> The calling code should definitely not rely on a single timer calling
> them back multiple times.

agreed.

>
>> One thing to keep in mind is that with the lock to serialize the
>> callback threads, you may block threadpool threads that need to run
>> something else, so the deadlock might be a side effect of
>> serializing the callbacks with a lock.
>
> What thread pool are you referring to?
>
> I believe all users of cl_timer acquire a lock to serialize access to
> their own lists.  In theory, all we've done is move the locking down
> into the cl_timer abstraction.
>
> - Sean




More information about the ofw mailing list