[ofa-general] IB performance stats (revisited)

Hal Rosenstock halr at voltaire.com
Wed Jun 27 06:32:51 PDT 2007


On Wed, 2007-06-27 at 09:17, Mark Seger wrote:
> I had posted something about this some time last year but now actually 
> have some data to present.
> My problem statement with IB is there is no efficient way to get 
> time-oriented performance numbers for all types of IB traffic.   As far 
> as I know nothing is available for all types of traffic, such as MPI. 

Not sure what you mean here. Are you looking for MPI counters ?
 
> This is further complicated because IB counters do not wrap and as a 
> result when the counters are integers, they end up latching in <30 
> seconds when under load.

This is mostly a problem for the data counters. This is what the
extended counters are for.

> The only way I am aware to do what I want to 
> do is by running perfquery AND then clearing the counters after each 
> request which by definition prevents anyone else from accessing the 
> counters including multiple instances of my program.

Yes, it is _bad_ if there are essentially multiple performance managers
resetting the counters.

There's now an experimental performance manager which has been discussed
on the list. The performance data collected can be accessed.

> To give people a better idea of what I'm talking about, below is an 
> extract from a utility I've written called 'collectl' which has been in 
> use on HP systems for about 4 years and which we've now Open Sourced at 
> http://sourceforge.net/projects/collectl [shameless plug].  In the 
> following sample I've requested cpu, network and IB stats (there are 
> actually a whole lot of other things you can examine and you can learn 
> more at http://collectl.sourceforge.net/index.html).

So you are looking for packets/bytes in/out only.

> Anyhow, what 
> you're seeing below is a sample taken every second.  At first there is 
> no IB traffic.  Then I start a 'netperf' and you can see the IB stats 
> jump.  A few seconds later I do a 'ping -f -s50000' to the ib interface 
> and you can now see an increase in the network traffic.
> 
> #         
> <--------CPU--------><-----------Network----------><----------InfiniBand---------->
> #Time     cpu sys inter  ctxsw netKBi pkt-in  netKBo pkt-out   KBin  
> pktIn  KBOut pktOut Errs
> 08:48:19    0   0  1046    137      0      4       0       2      0      
> 0      0      0    0
> 08:48:20    2   2 18659    170      0     10       0       5    925  
> 10767  80478  41636    0
> 08:48:21   14  14 92368   1882      0      9       1      10   3403  
> 39599 463892 235588    0
> 08:48:22   14  14 92167   2243      0      8       0       4   3186  
> 37081 471246 238743    0
> 08:48:23   12  12 92131   2382      0      3       0       2   4456  
> 37323 470766 238488    0
> 08:48:24   13  13 91708   2691      7    106      12     104   7300  
> 38542 466580 236450    0
> 08:48:25   14  14 91675   2763     11    175      20     175   7434  
> 38417 463952 235146    0
> 08:48:26   13  13 91712   2716     11    174      20     175   7486  
> 38464 465195 235767    0
> 08:48:27   14  14 91755   2742     11    171      19     171   7502  
> 38656 465079 235720    0
> 08:48:28   13  13 90131   2126     12    178      20     179   8257  
> 44080 424930 217067    0
> 08:48:29   13  13 89974   2389     13    191      22     191   7801  
> 37094 457082 231523    0
> 
> here's another display option where you can see just the ipoib traffic 
> along with other network stats
> 
> # NETWORK STATISTICS (/sec)
> #         Num    Name  InPck  InErr OutPck OutErr   Mult   ICmp   
> OCmp    IKB    OKB
> 09:04:51    0     lo:      0      0      0      0      0      0      
> 0      0      0
> 09:04:51    1   eth0:     23      0      4      0      0      0      
> 0      1      0
> 09:04:51    2   eth1:      0      0      0      0      0      0      
> 0      0      0
> 09:04:51    3    ib0:    900      0    900      0      0      0      0   
> 1775   1779
> 09:04:51    4   sit0:      0      0      0      0      0      0      
> 0      0      0
> 09:04:52    0     lo:      0      0      0      0      0      0      
> 0      0      0
> 09:04:52    1   eth0:    127      0    126      0      0      0      
> 0      8     15
> 09:04:52    2   eth1:      0      0      0      0      0      0      
> 0      0      0
> 09:04:52    3    ib0:   2275      0   2275      0      0      0      0   
> 4488   4497
> 09:04:52    4   sit0:      0      0      0      0      0      0      
> 0      0      0
> 
> While this is a relatively light-weight operation (collectl uses <0.1% 
> of the cpu), I still do have to call perfquery every second and that 
> does generate a little overhead.  Furthermore, since I'm continuously 
> resetting the counters multiple instances of my tool or any other tool 
> that relies on these counters won't work correctly!
> 
> One solution that had been implemented in the Voltaire stack worked 
> quite well and that was a loadable module that read/cleared the HCA 
> counters, but exported them as wrapping counters in /proc.  That way 
> utilities could access the counters in /proc without stepping on each 
> others toes.  

Once in /proc, how are they all collected up ? Via IPoIB or out of band
ethernet ?

> While still not the best solution, as long as the counters 
> don't wrap in the HCA, read/clear is the only way to do what it is I'm 
> trying to do, unless of course someone has a better solution.

Doesn't have the same problem as doing it the PMA way ? Doesn't this
impact other performance managers ?

> I also 
> realize with 64 bit counters this becomes a non-issue but I'm trying to 
> solve the more general case.

More devices are supporting these and it should be easier to do so with
IBA 1.2.1

-- Hal

> comments?  flames?  8-)
> 
> -mark
> 
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general




More information about the general mailing list