[ofa-general] IB performance stats (revisited)

Mark Seger Mark.Seger at hp.com
Wed Jun 27 06:17:36 PDT 2007


I had posted something about this some time last year but now actually 
have some data to present.
My problem statement with IB is there is no efficient way to get 
time-oriented performance numbers for all types of IB traffic.   As far 
as I know nothing is available for all types of traffic, such as MPI.  
This is further complicated because IB counters do not wrap and as a 
result when the counters are integers, they end up latching in <30 
seconds when under load.  The only way I am aware to do what I want to 
do is by running perfquery AND then clearing the counters after each 
request which by definition prevents anyone else from accessing the 
counters including multiple instances of my program.

To give people a better idea of what I'm talking about, below is an 
extract from a utility I've written called 'collectl' which has been in 
use on HP systems for about 4 years and which we've now Open Sourced at 
http://sourceforge.net/projects/collectl [shameless plug].  In the 
following sample I've requested cpu, network and IB stats (there are 
actually a whole lot of other things you can examine and you can learn 
more at http://collectl.sourceforge.net/index.html).  Anyhow, what 
you're seeing below is a sample taken every second.  At first there is 
no IB traffic.  Then I start a 'netperf' and you can see the IB stats 
jump.  A few seconds later I do a 'ping -f -s50000' to the ib interface 
and you can now see an increase in the network traffic.

#         
<--------CPU--------><-----------Network----------><----------InfiniBand---------->
#Time     cpu sys inter  ctxsw netKBi pkt-in  netKBo pkt-out   KBin  
pktIn  KBOut pktOut Errs
08:48:19    0   0  1046    137      0      4       0       2      0      
0      0      0    0
08:48:20    2   2 18659    170      0     10       0       5    925  
10767  80478  41636    0
08:48:21   14  14 92368   1882      0      9       1      10   3403  
39599 463892 235588    0
08:48:22   14  14 92167   2243      0      8       0       4   3186  
37081 471246 238743    0
08:48:23   12  12 92131   2382      0      3       0       2   4456  
37323 470766 238488    0
08:48:24   13  13 91708   2691      7    106      12     104   7300  
38542 466580 236450    0
08:48:25   14  14 91675   2763     11    175      20     175   7434  
38417 463952 235146    0
08:48:26   13  13 91712   2716     11    174      20     175   7486  
38464 465195 235767    0
08:48:27   14  14 91755   2742     11    171      19     171   7502  
38656 465079 235720    0
08:48:28   13  13 90131   2126     12    178      20     179   8257  
44080 424930 217067    0
08:48:29   13  13 89974   2389     13    191      22     191   7801  
37094 457082 231523    0

here's another display option where you can see just the ipoib traffic 
along with other network stats

# NETWORK STATISTICS (/sec)
#         Num    Name  InPck  InErr OutPck OutErr   Mult   ICmp   
OCmp    IKB    OKB
09:04:51    0     lo:      0      0      0      0      0      0      
0      0      0
09:04:51    1   eth0:     23      0      4      0      0      0      
0      1      0
09:04:51    2   eth1:      0      0      0      0      0      0      
0      0      0
09:04:51    3    ib0:    900      0    900      0      0      0      0   
1775   1779
09:04:51    4   sit0:      0      0      0      0      0      0      
0      0      0
09:04:52    0     lo:      0      0      0      0      0      0      
0      0      0
09:04:52    1   eth0:    127      0    126      0      0      0      
0      8     15
09:04:52    2   eth1:      0      0      0      0      0      0      
0      0      0
09:04:52    3    ib0:   2275      0   2275      0      0      0      0   
4488   4497
09:04:52    4   sit0:      0      0      0      0      0      0      
0      0      0

While this is a relatively light-weight operation (collectl uses <0.1% 
of the cpu), I still do have to call perfquery every second and that 
does generate a little overhead.  Furthermore, since I'm continuously 
resetting the counters multiple instances of my tool or any other tool 
that relies on these counters won't work correctly!

One solution that had been implemented in the Voltaire stack worked 
quite well and that was a loadable module that read/cleared the HCA 
counters, but exported them as wrapping counters in /proc.  That way 
utilities could access the counters in /proc without stepping on each 
others toes.  While still not the best solution, as long as the counters 
don't wrap in the HCA, read/clear is the only way to do what it is I'm 
trying to do, unless of course someone has a better solution.  I also 
realize with 64 bit counters this becomes a non-issue but I'm trying to 
solve the more general case.

comments?  flames?  8-)

-mark




More information about the general mailing list