[ofa-general] IB performance stats (revisited)
Mark Seger
Mark.Seger at hp.com
Wed Jun 27 07:10:00 PDT 2007
btw - I've cc'd Ed on this so be sure to include him in your replies.
Hal Rosenstock wrote:
> On Wed, 2007-06-27 at 09:17, Mark Seger wrote:
>
>> I had posted something about this some time last year but now actually
>> have some data to present.
>> My problem statement with IB is there is no efficient way to get
>> time-oriented performance numbers for all types of IB traffic. As far
>> as I know nothing is available for all types of traffic, such as MPI.
>>
>
> Not sure what you mean here. Are you looking for MPI counters ?
>
sorry for not being clearer. I'm looking for total aggregate I/O.
>> This is further complicated because IB counters do not wrap and as a
>> result when the counters are integers, they end up latching in <30
>> seconds when under load.
>>
>
> This is mostly a problem for the data counters. This is what the
> extended counters are for
>
but it's the data counters I'm interested in.
>> The only way I am aware to do what I want to
>> do is by running perfquery AND then clearing the counters after each
>> request which by definition prevents anyone else from accessing the
>> counters including multiple instances of my program.
>>
>
> Yes, it is _bad_ if there are essentially multiple performance managers
> resetting the counters.
>
I realize it's bad but since the counters don't wrap I have no alternative.
> There's now an experimental performance manager which has been discussed
> on the list. The performance data collected can be accessed.
>
alas, since I use this tool on commercial systems, I can't run it
against experimental code. perhaps when the experimental becomes real I
can. I'll try to find the notes in the archives.
>> To give people a better idea of what I'm talking about, below is an
>> extract from a utility I've written called 'collectl' which has been in
>> use on HP systems for about 4 years and which we've now Open Sourced at
>> http://sourceforge.net/projects/collectl [shameless plug]. In the
>> following sample I've requested cpu, network and IB stats (there are
>> actually a whole lot of other things you can examine and you can learn
>> more at http://collectl.sourceforge.net/index.html).
>>
>
> So you are looking for packets/bytes in/out only.
>
That's a good start. Since I'm using perfquery I'm also reporting
aggregate error counts as well as you can see in my program output
below. The theory is these should rarely be set and if they are, their
total should be sufficient to highly a problem without taking up a lot
of screen real estate.
>> Anyhow, what
>> you're seeing below is a sample taken every second. At first there is
>> no IB traffic. Then I start a 'netperf' and you can see the IB stats
>> jump. A few seconds later I do a 'ping -f -s50000' to the ib interface
>> and you can now see an increase in the network traffic.
>>
>> #
>> <--------CPU--------><-----------Network----------><----------InfiniBand---------->
>> #Time cpu sys inter ctxsw netKBi pkt-in netKBo pkt-out KBin
>> pktIn KBOut pktOut Errs
>> 08:48:19 0 0 1046 137 0 4 0 2 0
>> 0 0 0 0
>> 08:48:20 2 2 18659 170 0 10 0 5 925
>> 10767 80478 41636 0
>> 08:48:21 14 14 92368 1882 0 9 1 10 3403
>> 39599 463892 235588 0
>> 08:48:22 14 14 92167 2243 0 8 0 4 3186
>> 37081 471246 238743 0
>> 08:48:23 12 12 92131 2382 0 3 0 2 4456
>> 37323 470766 238488 0
>> 08:48:24 13 13 91708 2691 7 106 12 104 7300
>> 38542 466580 236450 0
>> 08:48:25 14 14 91675 2763 11 175 20 175 7434
>> 38417 463952 235146 0
>> 08:48:26 13 13 91712 2716 11 174 20 175 7486
>> 38464 465195 235767 0
>> 08:48:27 14 14 91755 2742 11 171 19 171 7502
>> 38656 465079 235720 0
>> 08:48:28 13 13 90131 2126 12 178 20 179 8257
>> 44080 424930 217067 0
>> 08:48:29 13 13 89974 2389 13 191 22 191 7801
>> 37094 457082 231523 0
>>
>> here's another display option where you can see just the ipoib traffic
>> along with other network stats
>>
>> # NETWORK STATISTICS (/sec)
>> # Num Name InPck InErr OutPck OutErr Mult ICmp
>> OCmp IKB OKB
>> 09:04:51 0 lo: 0 0 0 0 0 0
>> 0 0 0
>> 09:04:51 1 eth0: 23 0 4 0 0 0
>> 0 1 0
>> 09:04:51 2 eth1: 0 0 0 0 0 0
>> 0 0 0
>> 09:04:51 3 ib0: 900 0 900 0 0 0 0
>> 1775 1779
>> 09:04:51 4 sit0: 0 0 0 0 0 0
>> 0 0 0
>> 09:04:52 0 lo: 0 0 0 0 0 0
>> 0 0 0
>> 09:04:52 1 eth0: 127 0 126 0 0 0
>> 0 8 15
>> 09:04:52 2 eth1: 0 0 0 0 0 0
>> 0 0 0
>> 09:04:52 3 ib0: 2275 0 2275 0 0 0 0
>> 4488 4497
>> 09:04:52 4 sit0: 0 0 0 0 0 0
>> 0 0 0
>>
>> While this is a relatively light-weight operation (collectl uses <0.1%
>> of the cpu), I still do have to call perfquery every second and that
>> does generate a little overhead. Furthermore, since I'm continuously
>> resetting the counters multiple instances of my tool or any other tool
>> that relies on these counters won't work correctly!
>>
>> One solution that had been implemented in the Voltaire stack worked
>> quite well and that was a loadable module that read/cleared the HCA
>> counters, but exported them as wrapping counters in /proc. That way
>> utilities could access the counters in /proc without stepping on each
>> others toes.
>>
>
> Once in /proc, how are they all collected up ? Via IPoIB or out of band
> ethernet ?
>
Not sure I understand the question. They're written to /proc via a
module. They're collected up via my tool simply reading them back and
parsing the return string which looks like
ib0-1: 1 0 1 0x0000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
This is essentially the same data reported by get_pcounter reformatted
to a single line for easier/faster parsing by collectl
>> While still not the best solution, as long as the counters
>> don't wrap in the HCA, read/clear is the only way to do what it is I'm
>> trying to do, unless of course someone has a better solution.
>>
>
> Doesn't have the same problem as doing it the PMA way ? Doesn't this
> impact other performance managers ?
>
Good point, but I guess I'm between a rock and a hard place. imho: as
long as the counters don't wrap this problem will never be solved.
I'm trying to address a specific monitoring scenario, one which collects
data locally for analysis after a system problem occurs. I discovered
long ago that central management solutions may work fine when trying to
assess the health of many systems, but when something goes wrong with
the network the only data that can tell you what's going wrong can't get
back to the management station over the now broken network. My
philosophy is if you want to continuously collect reliable performance
metrics you need to use minimal system resources to do so and that means
no network communications. I guess that means people need to decide if
they want to use collectl to gather local IB stats they have to forego
doing it globally.
So what is the chance of ever seeing wrapping IB counters? Probably
none, right? 8-(
>> I also
>> realize with 64 bit counters this becomes a non-issue but I'm trying to
>> solve the more general case.
>>
>
> More devices are supporting these and it should be easier to do so with
> IBA 1.2.1
>
Is there an easy way to tell how wide the counters are via software? Do
any utilities currently report this?
> -- Hal
>
>
>> comments? flames? 8-)
>>
>> -mark
>>
>> _______________________________________________
>> general mailing list
>> general at lists.openfabrics.org
>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>>
>> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>>
More information about the general
mailing list