[ofa-general] madrpc_init and reseting performance counters

Ralph Campbell ralph.campbell at qlogic.com
Fri Apr 11 14:08:22 PDT 2008


Also, be aware that opensm now tries to poll the performance
counters and keep a total. If you have more than one thing
in the system trying to keep track of the total, they will
conflict and each only see part of the total counts.

On Fri, 2008-04-11 at 16:50 -0400, Dan Noe wrote:
> On 4/10/2008 10:32, Hal Rosenstock wrote:
> >> I've verified that libibumad rpms are installed.  Only calling
> >> madrpc_init at the front end of my polling only allows me to reset the
> >> port that was initialized last.  Does anyone have some insight into
> >> how I gather/reset each port without having to call madrpc_init each
> >> time I poll that port?
> > 
> > There's already a tool which does what you are describing at a high
> > level: perfquery -R and also scripts for the entire subnet:
> > ibclearcounters or ibclearerrors (if you just want to clear the error
> > counters).
> 
> Our software is trying to get around the limitation of 32-bit IB 
> counters - unfortunately the counters get "stuck" at 0xFFFFFFFF instead 
> of wrapping so to avoid data loss it is neccessary to poll them 
> periodically, keep a running total (in a 64 bit counter :) and reset the 
> counters.
> 
> We're trying to avoid fork()/exec() since the resets need to happen 
> fairly frequently.  So calling out to perfquery to reset the counter is 
> suboptimal.
> 
> The solution Joel had mentioned was to use madrpc_init() and then call 
> port_performance_reset() to reset the port.  But madrpc_init keeps a 
> static file descriptor (mad_portid) that is used for subsequent calls 
> (such as is eventually used when port_performance_reset() is called). 
> And, there does not seem to be any method to close this file descriptor.
> 
> So, it is impossible to extend this method to multiple devices (or even 
> multiple ports).  With a single call to madrpc_init one can perpetually 
> reset the performance counters in the polling loop but this approach 
> doesn't work with multiple devices.  If madrpc_init is called more than 
> once, it leaks a file descriptor.
> 
> There is a reference in the man page for umad_init (which is called) to 
> calling umad_done but this doesn't seem to work:
> 
> int
> umad_done(void)
> {
>          TRACE("umad_done");
>          /* FIXME - verify that all ports are closed */
>          return 0;
> }
> 
> I did notice there is a way to access the static file descriptor using 
> madrpc_portid().  I assume this could be used to close the file 
> descriptor opened by madrpc_init but it isn't clear if there are other 
> resources that need cleanup.  We're going to take this approach and see 
> where it gets us.
> 
> Any further insight is greatly appreciated.
> 
> Cheers,
> Dan
> 




More information about the general mailing list