[ofw] Opensm or umad bug
Hefty, Sean
sean.hefty at intel.com
Thu Apr 28 08:14:51 PDT 2011
> We have the following code commented out at umad_receiver_stop:
>
> /* XXX hangs current thread - suspect umad_recv() ignoring wakeup.
>
> cl_thread_destroy(&p_ur->tid);
>
> */
>
> How can one ensure that umad_receiver thread will not run after
> osm_vendor_delete was called ?
umad_recv() does the following basic operations that can block the calling thread:
ResetEvent(port->overlap.hEvent);
hr = port->prov->Receive(mad, sizeof(WM_MAD) + (size_t) *length, &port->overlap);
if (hr == WV_IO_PENDING) {
hr = WaitForSingleObject(port->overlap.hEvent, (DWORD) timeout_ms);
if (hr == WAIT_TIMEOUT) {
hr = umad_cancel_recv(port);
// umad_cancel_recv does:
// port->prov->CancelOverlappedRequests();
// return port->prov->GetOverlappedResult(&port->overlap, &bytes, TRUE);
There are 2 blocking calls, WaitForSingleObject (obviously) and GetOverlappedResult. The latter should not block for an extended period of time, since the overlap request was canceled on the previous call.
I don't see in the documentation for WaitForSingleObject that signaling the thread unblocks it. For Windows, we could allow the user to signal the underlying event directly or expose the internal umad_cancel_recv call. libibverbs had to expose similar OS specific functionality.
Does anyone know how opensm unblocks the receive thread in Linux?
- Sean
More information about the ofw
mailing list