[ofa-general] ***SPAM*** Re: [PATCH] opensm: fix race in main OpenSM flow.

Hal Rosenstock hal.rosenstock at gmail.com
Thu Dec 4 13:03:40 PST 2008


On Thu, Dec 4, 2008 at 1:52 PM, Sasha Khapyorsky <sashak at voltaire.com> wrote:
>
> wait_for_pending_transactions() is heavily used during OpenSM heavy and
> light sweep, it checks opensm.stats.qp0_mads_outstanding value and blocks
> (with pthread_cond_wait()) until it will reach zero, this event is
> signaled by pthread_cond_signal() and should wake waiting thread up.
> However this code (qp0_mads_outstanding decrease and signaling) is not
> protected by the same mutex as check code in
> wait_for_pending_transactions() is. As result there is an easily
> reproducible race condition (when qp0_mads_outstanding is decreased and
> signaled after the check and before pthread_cond_wait() call in
> wait_for_pending_transactions()), which causes OpenSM to dead lock.
>
> This patch addresses this issue - it will use the same mutex which is
> used in wait_for_pending_transactions() for all qp0_mads_outstanding
> change operations.

Nice catch!

Looks to me like this has been there from around the following commit
or some related changes shortly thereafter:

commit 1b2eb3daddbfa9fc555488cddbea12b01f6635a3
Date:   Mon Jan 28 03:10:18 2008 +0200

    opensm: wait_for_pending_transaction() generalization

    Function wait_for_pending_transaction() is global now and moved from
    PerfMgr to StateMgr, all related objects are generalized.

If so, this is applicable to 3.1 and maybe also 3.0 based OpenSMs.

-- Hal



More information about the general mailing list