[ofw] Opensm or umad bug
Hefty, Sean
sean.hefty at intel.com
Thu Apr 28 10:15:24 PDT 2011
> We have the following code commented out at umad_receiver_stop:
>
> /* XXX hangs current thread - suspect umad_recv() ignoring wakeup.
>
> cl_thread_destroy(&p_ur->tid);
>
> */
This definitely looks like it will hang. cl_thread_destroy() does:
void
cl_thread_destroy(
IN cl_thread_t* const p_thread )
{
CL_ASSERT( p_thread );
if( !p_thread->osd.h_thread )
return;
/* Wait for the thread to exit. */
WaitForSingleObject( p_thread->osd.h_thread, INFINITE );
so, it immediately waits for some other action. Opensm calls umad_recv with an infinite timeout as well, and nothing signals that thread to exit. I don't see that Windows provides any way to signal a thread directly, or that Windows umad provides a way for a user to wake up the thread short of sending itself a MAD.
The best fix I can think of is to expose a new call on windows, umad_cancel_recv(), that umad_receiver_stop() can call before calling cl_thread_destroy().
- Sean
More information about the ofw
mailing list