Fwd: [ofa-general] Performance evaluation of Opensm
Nifty Tom Mitchell
niftyompi at niftyegg.com
Tue Jul 28 21:30:05 PDT 2009
On Tue, Jul 07, 2009 at 04:28:55PM +0530, Devesh Sharma wrote:
> Thanks Yevgeny, for your valuable input. This will surly help for my work.
>
> On Tue, Jul 7, 2009 at 2:59 PM, Yevgeny
> Kliteynik<kliteyn at dev.mellanox.co.il> wrote:
> > Hi Davesh,
> >
> > It's kind of hard to talk about "performance of OpenSM".
> > Subnet Manager has different phases and modes of operation,
> > each of them is completely separate issue:
> >
> > - Fabric discovery
> > - Fabric ports/nodes configuration
> > - Unicast routing calculation
> > - Unicast routing configuration on fabric switches
> > - Multicast routing calculation
> > - Multicast routing configuration on fabric switches
> > - SA queries processing
> > - Memory consumption
> > - Different routing algorithms consume different time and memory
> > - QoS
> > - etc, etc, etc
> >
> > Most of the above can be measured only on real cluster.
...
> But how these can be measured is there any compile time flag available
> in the Code?
You can edit the code and add a log or time stamps -- but I am
not sure that you should bother..... i.e. If there was a compile
time flag what would you compare it to?
*) N.B. once a fabric is setup up you can kill the subnet manager
and the fabric will stay as it is and continue to operate.
The implication of this is that the subnet manager is not
in a critical performance path for normal operation. It
is however in a "correctness + reliability" path which clearly
sets the agenda for the authors.
*) Much of the time you might measure on a cluster depends
on the interaction of the subnet manager and other
parts on the fabric. i.e. each node and switch in the cluster
has a key component in the process (dt(SM)+dt(SMA)=something).
This makes it hard to extract only the performance of the subnet manager.
*) Sweeping the fabric to discover changes, new or absent devices can often
be lazy. There is a configuration flag to tune this. On massive fabrics
the time to rescan will grow with the size of the fabric. A lazy
scan is normal. Trap notice SMP processing mitigates any lazy tuning.
*) Processor, cache type and size, memory performance, I/O path to IB hardware
TLB size and management all come to play. Small test fabric
results will have some edges, lumps and bumps in any curves
that make extrapolating to "interesting" fabric sizes difficult.
Some things have been done.
http://nowlab.cse.ohio-state.edu/publications/conf-papers/2005/vishnu-fastos05.pdf
--
T o m M i t c h e l l
Found me a new hat, now what?
More information about the general
mailing list