[ofa-general] madrpc_init and reseting performance counters

Dan Noe dpn at isomerica.net
Fri Apr 11 13:50:30 PDT 2008


On 4/10/2008 10:32, Hal Rosenstock wrote:
>> I've verified that libibumad rpms are installed.  Only calling
>> madrpc_init at the front end of my polling only allows me to reset the
>> port that was initialized last.  Does anyone have some insight into
>> how I gather/reset each port without having to call madrpc_init each
>> time I poll that port?
> 
> There's already a tool which does what you are describing at a high
> level: perfquery -R and also scripts for the entire subnet:
> ibclearcounters or ibclearerrors (if you just want to clear the error
> counters).

Our software is trying to get around the limitation of 32-bit IB 
counters - unfortunately the counters get "stuck" at 0xFFFFFFFF instead 
of wrapping so to avoid data loss it is neccessary to poll them 
periodically, keep a running total (in a 64 bit counter :) and reset the 
counters.

We're trying to avoid fork()/exec() since the resets need to happen 
fairly frequently.  So calling out to perfquery to reset the counter is 
suboptimal.

The solution Joel had mentioned was to use madrpc_init() and then call 
port_performance_reset() to reset the port.  But madrpc_init keeps a 
static file descriptor (mad_portid) that is used for subsequent calls 
(such as is eventually used when port_performance_reset() is called). 
And, there does not seem to be any method to close this file descriptor.

So, it is impossible to extend this method to multiple devices (or even 
multiple ports).  With a single call to madrpc_init one can perpetually 
reset the performance counters in the polling loop but this approach 
doesn't work with multiple devices.  If madrpc_init is called more than 
once, it leaks a file descriptor.

There is a reference in the man page for umad_init (which is called) to 
calling umad_done but this doesn't seem to work:

int
umad_done(void)
{
         TRACE("umad_done");
         /* FIXME - verify that all ports are closed */
         return 0;
}

I did notice there is a way to access the static file descriptor using 
madrpc_portid().  I assume this could be used to close the file 
descriptor opened by madrpc_init but it isn't clear if there are other 
resources that need cleanup.  We're going to take this approach and see 
where it gets us.

Any further insight is greatly appreciated.

Cheers,
Dan

-- 
Dan Noe (dpn at lampreynetworks.com)
Software Engineer
Lamprey Networks, Inc.



More information about the general mailing list