[ofa-general] [PATCH] OpenSM: Add a Performance Manager HOWTO to the docs and the dist

Ira Weiny weiny2 at llnl.gov
Thu May 15 13:27:23 PDT 2008


There seems to be a lot of questions on the list about how to gather port
counters.  The Performance Manager included in OpenSM, v3.1.X (OFED 1.3) can be
used to collect these counters in one place.  I decided to write a little HOWTO
to help people to set it up.

Patch is attached,
Ira

>From bfc303f76a40fb5e3a9cf2c01c16c25c517c8ddd Mon Sep 17 00:00:00 2001
From: Ira K. Weiny <weiny2 at llnl.gov>
Date: Thu, 15 May 2008 08:19:17 -0700
Subject: [PATCH] Add a Performance Manager HOWTO to the docs and the dist


Signed-off-by: Ira K. Weiny <weiny2 at llnl.gov>
---
 opensm/Makefile.am                       |    3 +-
 opensm/doc/performance-manager-HOWTO.txt |  153 ++++++++++++++++++++++++++++++
 opensm/opensm.spec.in                    |    2 +-
 3 files changed, 156 insertions(+), 2 deletions(-)
 create mode 100644 opensm/doc/performance-manager-HOWTO.txt

diff --git a/opensm/Makefile.am b/opensm/Makefile.am
index 3811963..4c79f49 100644
--- a/opensm/Makefile.am
+++ b/opensm/Makefile.am
@@ -24,8 +24,9 @@ endif
 man_MANS = man/opensm.8 man/osmtest.8
 
 various_scripts = $(wildcard scripts/*)
+docs = doc/performance-manager-HOWTO.txt
 
-EXTRA_DIST = autogen.sh opensm.spec $(various_scripts) $(man_MANS)
+EXTRA_DIST = autogen.sh opensm.spec $(various_scripts) $(man_MANS) $(docs)
 
 dist-hook: $(EXTRA_DIST)
 	if [ -x $(top_srcdir)/../gen_chlog.sh ] ; then \
diff --git a/opensm/doc/performance-manager-HOWTO.txt b/opensm/doc/performance-manager-HOWTO.txt
new file mode 100644
index 0000000..c655f6c
--- /dev/null
+++ b/opensm/doc/performance-manager-HOWTO.txt
@@ -0,0 +1,153 @@
+OpenSM Performance manager HOWTO
+================================
+
+Introduction
+============
+
+OpenSM now includes a performance manager which collects Port counters from
+the subnet and stores them internally in OpenSM.
+
+Some of the features of the performance manager are:
+
+	1) Collect port data and error counters per v1.2 spec and store in
+	   64bit internal counts.
+	2) Automatic reset of counters when they reach approximatly 3/4 full.
+	   (While not guarenteeing that counts will not be missed this does
+	   keep counts incrementing as best as possible given the current
+	   hardware limitations.)
+	3) Basic warnings in the OpenSM log on "critical" errors like symbol
+	   errors.
+	4) Automatically detects "outside" resets of counters and adjusts to
+	   continue collecting data.
+	5) Can be run in a standby SM.
+
+Known issues are:
+
+	1) Data counters will be lost on high data rate links.  Sweeping the
+	   fabric fast enough for a DDR link is not practical.
+	2) Default partition support only.
+
+
+Setup and Usage
+===============
+
+Using the Performance Manager consists of 3 steps:
+
+	1) compiling in support for the perfmgr (Optionally: the console
+	   socket as well)
+	2) enabling the perfmgr and console in opensm.opts
+	3) retrieving data which has been collected.
+	   3a) using console to "dump data"
+	   3b) using a plugin module to store the data to your own
+	       "database"
+
+Step 1: Compile in support for the Performance Manager
+------------------------------------------------------
+
+Because of the performance manager's experimental status, it is not enabled at
+compile time by default.  (This will hopefully soon change as more people use
+it and confirm that it does not break things...  ;-)  The configure option is
+"--enable-perf-mgr".
+
+At this time it is really best to enable the console socket option as well.
+OpenSM can be run in an "interactive" mode.  But with the console socket option
+turned on one can also make a connection to a running OpenSM.  The console
+option is "--enable-console-socket".  This option requires the use of
+tcp_wrappers to ensure security.  Please be aware of your configuration for
+tcp_wrappers as the commands presented in the console can affect the operation
+of your subnet.
+
+The following configure line includes turning on the performance manager as
+well as the console:
+
+	./configure --enable-perf-mgr --enable-console-socket
+
+
+Step 2: Enable the perfmgr and console in opensm.opts
+-----------------------------------------------------
+
+Turning the Perfmorance Manager on is pretty easy, set the following options in
+the opensm.opts config file.  (Default location is
+/var/cache/opensm/opensm.opts)
+
+	# Turn it all on.
+	perfmgr TRUE
+
+	# sweep time in seconds
+	perfmgr_sweep_time_s 180
+
+	# Dump file to dump the events to
+	event_db_dump_file /var/log/opensm_port_counters.log
+
+Also enable the console socket and configure the port for it to listen to if
+desired.
+
+	# console [off|local|socket]
+	console socket
+
+	# Telnet port for console (default 10000)
+	console_port 10000
+
+As noted above you also need to set up tcp_wrappers to prevent unauthorized
+users from connecting to the console.[*]
+
+	[*] As an alternate you can use the loopback mode but I noticed when
+	writing this (OpenSM v3.1.10; OFED 1.3) that there are some bugs in
+	specifying the loopback mode in the opensm.opts file.  Look for this to
+	be fixed in newer versions.
+
+	[**] Also you could use "local" but this is only useful if you run
+	OpenSM in the foreground of a terminal.  As OpenSM is usually started
+	as a daemon I left this out as an option.
+
+Step 3: retrieve data which has been collected
+----------------------------------------------
+
+Step 3a: Using console dump function
+------------------------------------
+
+The console command "perfmgr dump_counters" will dump counters to the file
+specified in the opensm.opts file.  In the example above
+"/var/log/opensm_port_counters.log"
+
+Example output is below:
+
+<snip>
+"SW1 wopr ISR9024D (MLX4 FW)" 0x8f10400411f56 port 1 (Since Mon May 12 13:27:14 2008)
+     symbol_err_cnt       : 0
+     link_err_recover     : 0
+     link_downed          : 0
+     rcv_err              : 0
+     rcv_rem_phys_err     : 0
+     rcv_switch_relay_err : 2
+     xmit_discards        : 0
+     xmit_constraint_err  : 0
+     rcv_constraint_err   : 0
+     link_integrity_err   : 0
+     buf_overrun_err      : 0
+     vl15_dropped         : 0
+     xmit_data            : 470435
+     rcv_data             : 405956
+     xmit_pkts            : 8954
+     rcv_pkts             : 6900
+     unicast_xmit_pkts    : 0
+     unicast_rcv_pkts     : 0
+     multicast_xmit_pkts  : 0
+     multicast_rcv_pkts   : 0
+</snip>
+
+
+Step 3b: Using a plugin module
+------------------------------
+
+If you want a more automated method of retrieving the data OpenSM provides a
+plugin interface to extend OpenSM.  The header file is osm_event_plugin.h.
+The functions you register with this interface will be called when data is
+collected.  You can then use that data as appropriate.
+
+An example plugin can be configured at compile time using the
+"--enable-default-event-plugin" option on the configure line.  This plugin is
+very simple.  It logs "events" recieved from the performance manager to a log
+file.  I don't recomend using this directly but rather use it as a templat to
+create your own plugin.
+
diff --git a/opensm/opensm.spec.in b/opensm/opensm.spec.in
index feabfef..c36d6f2 100644
--- a/opensm/opensm.spec.in
+++ b/opensm/opensm.spec.in
@@ -125,7 +125,7 @@ fi
 %{_sbindir}/opensm
 %{_sbindir}/osmtest
 %{_mandir}/man8/*
-%doc AUTHORS COPYING README
+%doc AUTHORS COPYING README doc/performance-manager-HOWTO.txt
 %{_sysconfdir}/init.d/opensmd
 %{_sbindir}/sldd.sh
 %config(noreplace) @OPENSM_CONFIG_DIR@/opensm.conf
-- 
1.5.1




More information about the general mailing list