[openib-general] Re: [PATCH] Opensm - locking order issue

Hal Rosenstock halr at voltaire.com
Wed Sep 28 03:04:05 PDT 2005


Hi Yael,

On Wed, 2005-09-28 at 03:45, Yael Kalka wrote:
> During one of our Windows runs we encountered a deadlock in opensm.
> This caused us to review the different locks in the opensm code, and
> we found 2 places that might cause deadlock.
> In the osm_state_mgr_process the order of the locks is:
> 1. osm_state_mgr_t.state_lock
> 2. p_lock (same pointer for the different sm managers)
> 
> We noticed 2 places where this order wasn't kept, and a deadlock might
> occure. This happened where inside the p_lock osm_state_mgr_process
> function was called.
> 
> Attached is a patch resolving this issue.

Excellent catch. Thanks. Applied.

I wonder whether this is related to any of the other issues (not on
SMInfo but on NodeInfo perhaps).

Also, I am seeing the following which should be a valid self transition
(DISCOVER -> DISCOVER):
Sep 28 05:51:59 129290 [B76F8C40] -> __osm_sm_state_mgr_signal_error: ERR 3207: Invalid signal OSM_SM_SIGNAL_HANDOVER in state IB_SMINFO_STATE_DISCOVERING.
Sep 28 05:51:59 129310 [B76F8C40] -> osm_sm_state_mgr_check_legality: ]
Sep 28 05:51:59 129329 [B76F8C40] -> __osm_sminfo_rcv_process_set_request: ERR 2F07: Check legality of SM needed transition. AttributeModifier:0x1000000 RemoteState:IB_SMINFO_STATE_MASTER
If you agree, do you want to fix this or should I ?

-- Hal





More information about the general mailing list