[ofw] Opensm or umad bug

Hefty, Sean sean.hefty at intel.com
Thu Apr 28 10:15:24 PDT 2011


> We have the following code commented out at umad_receiver_stop:
> 
>         /* XXX hangs current thread - suspect umad_recv() ignoring wakeup.
> 
>         cl_thread_destroy(&p_ur->tid);
> 
>         */

This definitely looks like it will hang.  cl_thread_destroy() does:

void
cl_thread_destroy( 
	IN	cl_thread_t* const	p_thread )
{
	CL_ASSERT( p_thread );

	if( !p_thread->osd.h_thread )
		return;

	/* Wait for the thread to exit. */
	WaitForSingleObject( p_thread->osd.h_thread, INFINITE );

so, it immediately waits for some other action.  Opensm calls umad_recv with an infinite timeout as well, and nothing signals that thread to exit.  I don't see that Windows provides any way to signal a thread directly, or that Windows umad provides a way for a user to wake up the thread short of sending itself a MAD.

The best fix I can think of is to expose a new call on windows, umad_cancel_recv(), that umad_receiver_stop() can call before calling cl_thread_destroy().

- Sean



More information about the ofw mailing list