[ofw] [PATCH v2] complib/user: fix timer race conditions

Hefty, Sean sean.hefty at intel.com
Mon Jun 28 15:34:51 PDT 2010


> This looks good to me.  Does it work as intended?  Or is that something for
> Stan to try at scale?

All I can say is that I applied the patch to opensm, and it ran successfully on my two node cluster.  Amazing, I know.  I need Stan to test across the larger cluster.

> > Moved the calculation of the timeout time to inside the critical
> > section to improve its accuracy in case an attempt to acquire the
> > critical section blocks.
> 
> How does this improve accuracy?  I suppose it depend on whether the timeout
> time is relative to the client making the call, or the call returning.
> Having the timeout calculated before the critsec improves the former,
> calculating it under lock improves the latter.

I was worried about the time between setting timeout and using it:

+	timeout = cl_get_time_stamp() + (((uint64_t)time_ms) * 1000);
+	if ( !p_timer->timeout_time || timeout < p_timer->timeout_time )

If timeout is set before the critical section, and the thread blocks, the if check is more likely to return success than setting it after.  The result is that the timer may be adjusted, which would set the timer out _further_ that it actually is.

- Sean



More information about the ofw mailing list