[openib-general] Re: performance counters in /sys
Mark Seger
Mark.Seger at hp.com
Mon May 23 11:49:17 PDT 2005
Since I was the one who originally created this topic I'd like to
restate what I said that got this all started. I'm trying to do
relatively lightweight monitoring of lots of system performance counters
(on the order of 100-200 or more) across a number of subsystems using
standard interfaces. While I don't feel the need to be able to take
10Ks of samples/sec I would like to at least run efficiently at 1-10/sec
range. I also want to avoid writing custom kernel code and/or talking
directly to hardware.
As I said in my base note I'm currently reading from /proc while some
sets of counters are better organized than others, I can still access
them relatively efficiently. While I could certainly "get by" reading
one variable per file, I do worry about the overhead as the sampling
frequency goes down. This will also be a problem as the number of
counters and devices grow. The suggestion about using perfquery would
certainly work, but I'd also be concerned about the overhead in running
it at smaller sampling intervals.
I certainly understand the desire to move to sysfs and that
/usr/src/linux/Documentation/filesystems/sysfs.txt states that "Mixing
types, expressing multiple lines of data, and doing fancy formatting of
data is heavily frowned upon. Doing these things may get you publically
humiliated and your code rewritten without notice." However, I don't
read this to mean you must only have one data item per file. For
example, I took a look at /sys/block/hda/stat because one of the types
of data I collect is disk stats and I was wondering how sysfs dealt with
them. Sure enough, they're all in one file per disk as shown below:
dl380-2: cat /sys/block/hda/stat
0 0 0 0 0 0 0
0 0 0 0
Also note some of these count bytes, some sectors and other jiffies, so
even the units need not be identical.
-mark
Hal Rosenstock wrote:
>On Mon, 2005-05-23 at 12:27, Sean Hefty wrote:
>
>
>>Are there any performance counters that aren't available through the PMA
>>MADs? If not, is there any reason why the PMA interface shouldn't be used
>>for programmatic access?
>>
>>
>
>All the counters found in:
>/sys/class/infiniband/mthca0/ports/1/counters
>excessive_buffer_overrun_errors port_rcv_remote_physical_errors
>link_downed port_rcv_switch_relay_errors
>link_error_recovery port_xmit_constraint_errors
>local_link_integrity_errors port_xmit_data
>port_rcv_constraint_errors port_xmit_discards
>port_rcv_data port_xmit_packets
>port_rcv_errors symbol_error
>port_rcv_packets VL15_dropped
>
>are available via the PMA (and via the perfquery tool):
>/usr/local/ib/bin/perfquery 1 1
># Port counters: Lid 0x1 port 1
>PortSelect:......................1
>CounterSelect:...................0x0000
>SymbolErrors:....................10344
>LinkRecovers:....................255
>LinkDowned:......................4
>RcvErrors:.......................0
>RcvRemotePhysErrors:.............0
>RcvSwRelayErrors:................0
>XmtDiscards:.....................19
>XmtConstraintErrors:.............0
>RcvConstraintErrors:.............0
>LinkIntegrityErrors:.............0
>ExcBufOverrunErrors:.............0
>VL15Dropped:.....................0
>XmtBytes:........................126990
>RcvBytes:........................126952
>XmtPkts:.........................1791
>RcvBytes:........................1790
>
>One advantage is that all counters are retrieved with one MAD.
>
>-- Hal
>
>
More information about the general
mailing list