[ofw] OpenSM commits

Smith, Stan stan.smith at intel.com
Thu Feb 2 16:46:17 PST 2012


Hello Leo,
  Thank you for directing my attention to the OpenSM service manger interface. The Mellanox service manager code I copied from much, much earlier version of opensm was working for reasons other than what I thought. Upon revisiting the service interface after you pointed out the termination shortcomings, I was able to see how the service interface should have been coded.
When a SERVICE_CONTROL_STOP is received, status SERVICE_STOP_PENDING is immediately sent to the Windows Service Manager, with the osm_exit_event being set and osm_exit_flag set.
The event releases the waiting OpenSM serviceMain thread, which after correctly starting opensm started waiting (osm_exit_event) to terminate the opensm service (per MS service design).
The previous code was returning success(1) to the service manager which should not have occurred until all OpenSM threads were destroyed and the OpenSM service in the STOPPED state.
In today's code base, once the waiting OpenSM ServiceMain thread wakens, it sleep() loop waits until the OpenSM service indicates an exit condition or too much time has elapsed (15) seconds. When either condition arises, the serviceMain thread returns success(1) to the Windows service manger.
In order to allow the open service to reach the exit condition, I added a timeout to the umad_receiver() thread's umad_recv() in order to wakeup and recognize the the terminate condition set by umad_receiver_stop().
I really don't like the umad_recv() timeout, although it was needed to correctly support fixes in opensm serviceMain thread handling and opensm service stop.
In the near future, the timed umad_recv() will be replaced with a blocking umad_recv() (as it was).  Umad_receiver_stop() will set the umad_receiver() thread exit condition and then send a MAD to self which will wake the umad_receiver() thread to recognize the exit condition.
Thanks for your assistance and insights.

Stan.

Revision: 3386
Author: stan.smith at intel.com
Date: Thursday, February 02, 2012 4:12:26 PM
Message:
[OPENSM] remove ETIMEDOUT definition as _errno.h has it.
----
Modified : /gen1/trunk/ulp/opensm/user/include/vendor/winosm_common.h

Revision: 3387
Author: stan.smith at intel.com
Date: Thursday, February 02, 2012 4:14:34 PM
Message:
[OPENSM] remove a memory leak plus use a more reasonable path %PF%\OFED\OpenSM\ for OSM_DEFAULT_TORUS_CONF_FILE
----
Modified : /gen1/trunk/ulp/opensm/user/include/opensm/osm_base.h

Revision: 3388
Author: stan.smith at intel.com
Date: Thursday, February 02, 2012 4:16:02 PM
Message:
[OPENSM] Simplify GetOsmTempPath(), do the work once, use results multiple times.
----
Modified : /gen1/trunk/ulp/opensm/user/libvendor/winosm_common.c

Revision: 3389
Author: stan.smith at intel.com
Date: Thursday, February 02, 2012 4:31:16 PM
Message:
[OPENSM] in umad_receiver_stop(), request umad_receiver() thread termination.
Wait for umad_receiver() thread to indicate it's termination.
umad_receiver() thread has a 'temporary' 2 second umad_recv() timeout such that the umad_receiver_stop() terminate request will be acted upon. In the near future, the timed umad_recv() will be reverted back to the blocking read with  umad_receiver_stop() sending a MAD to self which will cause recognition of the terminate request instead of the umad_recv() timeout which wastes system resources.
Pushed fix in now to allow Service control thread fixes (main.c) to correctly wait for OpenSM service termination.
----
Modified : /gen1/trunk/ulp/opensm/user/libvendor/osm_vendor_ibumad.c

Revision: 3390
Author: stan.smith at intel.com
Date: Thursday, February 02, 2012 4:34:21 PM
Message:
[OPENSM] fix OpenSM service control thread to correctly wait for opensm service termination.  Fixed SvcDebugOut() wrapper to prefix messages with '[OpenSM service]' instead of each call providing the prefix.
----
Modified : /gen1/trunk/ulp/opensm/user/opensm/main.c




More information about the ofw mailing list