[ofa-general] madrpc_init and reseting performance counters
Dan Noe
dpn at isomerica.net
Fri Apr 11 13:50:30 PDT 2008
On 4/10/2008 10:32, Hal Rosenstock wrote:
>> I've verified that libibumad rpms are installed. Only calling
>> madrpc_init at the front end of my polling only allows me to reset the
>> port that was initialized last. Does anyone have some insight into
>> how I gather/reset each port without having to call madrpc_init each
>> time I poll that port?
>
> There's already a tool which does what you are describing at a high
> level: perfquery -R and also scripts for the entire subnet:
> ibclearcounters or ibclearerrors (if you just want to clear the error
> counters).
Our software is trying to get around the limitation of 32-bit IB
counters - unfortunately the counters get "stuck" at 0xFFFFFFFF instead
of wrapping so to avoid data loss it is neccessary to poll them
periodically, keep a running total (in a 64 bit counter :) and reset the
counters.
We're trying to avoid fork()/exec() since the resets need to happen
fairly frequently. So calling out to perfquery to reset the counter is
suboptimal.
The solution Joel had mentioned was to use madrpc_init() and then call
port_performance_reset() to reset the port. But madrpc_init keeps a
static file descriptor (mad_portid) that is used for subsequent calls
(such as is eventually used when port_performance_reset() is called).
And, there does not seem to be any method to close this file descriptor.
So, it is impossible to extend this method to multiple devices (or even
multiple ports). With a single call to madrpc_init one can perpetually
reset the performance counters in the polling loop but this approach
doesn't work with multiple devices. If madrpc_init is called more than
once, it leaks a file descriptor.
There is a reference in the man page for umad_init (which is called) to
calling umad_done but this doesn't seem to work:
int
umad_done(void)
{
TRACE("umad_done");
/* FIXME - verify that all ports are closed */
return 0;
}
I did notice there is a way to access the static file descriptor using
madrpc_portid(). I assume this could be used to close the file
descriptor opened by madrpc_init but it isn't clear if there are other
resources that need cleanup. We're going to take this approach and see
where it gets us.
Any further insight is greatly appreciated.
Cheers,
Dan
--
Dan Noe (dpn at lampreynetworks.com)
Software Engineer
Lamprey Networks, Inc.
More information about the general
mailing list