[ofa-general] IB performance stats (revisited)

Wed Jul 11 08:00:21 PDT 2007

Hal Rosenstock wrote:

>On Wed, 2007-07-11 at 10:15, Mark Seger wrote:
>  
>
>>My basic philosophy, and I suspect there are those who might disagree, 
>>is that you can't use the network to monitor the network, at least not 
>>in times of trouble.
>>    
>>
>
>Right, in times of certain troubles.
>  
>
and that is the key.  since you can't know apriori when you're about to 
have troubles, you need to be collecting the data locally before they occur.

>>That's why I insist on having to query the HCAs 
>>directly since I can't always be sure the network is there and/or 
>>reliable.  If you are willing to concede that this can indeed happen 
>>than the question becomes one of how do you reliably get data from an 
>>HCA and that's the basis for my (re)starting this discussion.
>>    
>>
>
>The reliability comes from timeout/retry mechanisms. If performance data
>cannot be obtained on an IB network, it needs to be trouble shooted at a
>lower level (by SMPs).
>
>In any case, a rearchitecture of the PMA was proposed and seems
>reasonable to me in that it can accomodate either approach. All that is
>needed now is for someone to step up and champion an implementation of
>this. Unfortunately, I do not have time to do so.
>  
>
I don't know if what I've been proposing requires any rearchitecting as 
I see is as something local to each node.  Specificially, and there is 
already an implementation of this in an earlier voltaire stack, is to 
export wrapping HCA counters to /proc.  The module that does this 
read/clears the counters on every access but since no local applications 
are accessing the counters directly, clearing them doesn't hurt anyone.  
Alas, anyone else who wants to query the counters will find them reset.

The other side benefit of exporting these counters is such a way is now 
lots of others can collect/report this info.  In other words is someone 
chose to add IB stats to sar, it would become very easy to do!

If this is the type of thing people are interested in, I might be able 
to supply some code to do it.

>>As for querying the switch for counters, what do you do on a very large 
>>network, say 10s of thousands of nodes if you want to get performance 
>>data every second?  I also realize this is an extreme situation today 
>>(the node count not the frequency of monitoring) but I'm sure everyone 
>>would agree systems of these sizes are not that far off.
>>    
>>
>
>You have a distributed performance manager to handle this. A hierarchy
>of performance managers has been discussed on the list before.
>  
>
ahh, I see.
-mark