Fwd: [ofa-general] Performance evaluation of Opensm

Nicolas Morey-Chaisemartin devel-ofed at morey-chaisemartin.com
Tue Jul 7 04:13:25 PDT 2009


Le 07/07/2009 12:58, Devesh Sharma a écrit :
> Thanks Yevgeny, for your valuable input. This will surly help for my work.
> 
> On Tue, Jul 7, 2009 at 2:59 PM, Yevgeny
> Kliteynik<kliteyn at dev.mellanox.co.il> wrote:
>> Hi Davesh,
>>
>> It's kind of hard to talk about "performance of OpenSM".
>> Subnet Manager has different phases and modes of operation,
>> each of them is completely separate issue:
>>
>> - Fabric discovery
>> - Fabric ports/nodes configuration
>> - Unicast routing calculation
>> - Unicast routing configuration on fabric switches
>> - Multicast routing calculation
>> - Multicast routing configuration on fabric switches
>> - SA queries processing
>> - Memory consumption
>> - Different routing algorithms consume different time and memory
>> - QoS
>> - etc, etc, etc
>>
>> Most of the above can be measured only on real cluster.
> But how these can be measured is there any compile time flag available
> in the Code?
>> Some (such as routing calculation and memory consumption) can
>> be measured while OSM is running on top of the simulator.
> Simulation results are far far away from real situation..:( I am
> interested in results with the real fabric.

Actually it's not. The scanning of the fabric is done before OpenSM calls the routing engine, so the routing engine is working from memory only anyway. So routing calculation time is exactly the same on a real fabric or a simulated one.
However, fabric discover time and LFT update time will differ I agree.
>> Some are very affected by the number of CPU cores that you
>> have on the management node (e.g. SA queries processing),
>> others mostly affected by the CPU frequency (unicast routing).
>> Also, various OpenSM options can affect these phases, such as
>> unicast routing cache may reduce routing calculation time to 0.
> Hmm........correct.
>> Sorry that I'm not really answering your question :(
>> I just want to point out the fact that there are many aspects
>> that should be considered when talking about OpenSM performance.
> Do we have any such tool with does profiling of all these phases of
> SM. Such tool will be
> helpful for the researcher working on different algorithms related to SM.

For internal actions, you can use valgrind --tool=callgrind
It provides a full analysis of any program so you can find where bottlenecks are and pretty much any perf info you may need. However it does not allow to mesure times for network operations.

>> If what you're interested in is just "system-wide" numbers,
>> then you'll probably want to know how much time it takes for
>> the OpenSM to bring up cluster from scratch, or how much time
>> it takes to reconfigure the fabric after some change.
> Will it be fine if I run OpenSM with "time" command and press Ctrl-C
> moment I see
> SUBNET UP msg. Of-course keeping some of the options and
> configurations as constant?
> 
> # time opensm -<some options>
> SUBNET UP Ctrl-C
> 
It should work. Problem is you won't have much granularity to know where the time is consumed. Plus Ctr-C doesn't kill OpenSM right away. If there are a lot of outstanding MAD, it can take few seconds before leaving.

Nicolas




More information about the general mailing list