Fwd: [ofa-general] Performance evaluation of Opensm

Nifty Tom Mitchell niftyompi at niftyegg.com
Tue Jul 28 21:30:05 PDT 2009


On Tue, Jul 07, 2009 at 04:28:55PM +0530, Devesh Sharma wrote:
> Thanks Yevgeny, for your valuable input. This will surly help for my work.
> 
> On Tue, Jul 7, 2009 at 2:59 PM, Yevgeny
> Kliteynik<kliteyn at dev.mellanox.co.il> wrote:
> > Hi Davesh,
> >
> > It's kind of hard to talk about "performance of OpenSM".
> > Subnet Manager has different phases and modes of operation,
> > each of them is completely separate issue:
> >
> > - Fabric discovery
> > - Fabric ports/nodes configuration
> > - Unicast routing calculation
> > - Unicast routing configuration on fabric switches
> > - Multicast routing calculation
> > - Multicast routing configuration on fabric switches
> > - SA queries processing
> > - Memory consumption
> > - Different routing algorithms consume different time and memory
> > - QoS
> > - etc, etc, etc
> >
> > Most of the above can be measured only on real cluster.
...
> But how these can be measured is there any compile time flag available
> in the Code?

You can edit the code and add a log or time stamps -- but I am
not sure that you should bother.....  i.e. If there was a compile
time flag what would you compare it to?

  *) N.B. once a fabric is setup up you can kill the subnet manager
	and the fabric will stay as it is and continue to operate.   
	The implication of this is that the subnet manager is not 
	in a critical performance path for normal operation.  It
	is however in a "correctness + reliability" path which clearly
	sets the agenda for the authors.

  *) Much of the time you might measure on a cluster depends
	on the interaction of the subnet manager and other 
	parts on the fabric. i.e. each node and switch in the cluster 
	has a key component in the process (dt(SM)+dt(SMA)=something).  
	This makes it hard to extract only the performance of the subnet manager.

  *) Sweeping the fabric to discover changes, new or absent devices can often 
	be lazy.  There is a configuration flag to tune this.  On massive fabrics
	the time to rescan will grow with the size of the fabric.  A lazy
	scan is normal. Trap notice SMP processing mitigates any lazy tuning.

  *) Processor, cache type and size, memory performance, I/O path to IB hardware
	TLB size and management all come to play. Small test fabric
	results will have some edges, lumps and bumps in any curves
	that make extrapolating to "interesting" fabric sizes difficult.

Some things have been done.
	http://nowlab.cse.ohio-state.edu/publications/conf-papers/2005/vishnu-fastos05.pdf


-- 
	T o m  M i t c h e l l 
	Found me a new hat, now what?




More information about the general mailing list