[ofa-general] IB performance stats (revisited)

Todd Bowman twbowman at gmail.com
Thu Jul 12 13:53:23 PDT 2007


This seems to be a good topic to share some work we have been doing here at
LANL.  ibmon is an app that I developed that is currently monitoring our IB
production systems.  It's small, written in c and perl  and follows the
standalone model and is SM independent.  It can be found at
http://sourceforge.net/projects/ibmon.

Key features:
- SM independent
- Reports "interesting" events via syslog, email or console
- Events can be reported in detailed and/or "high-level" form
- Detailed events are reported as a "point-to-point" link.
        - Makes for easier transformation to "high-level" form
- Fast, query on a ~4000 node network is < 5s.
- Uses sqlite for internal temp storage and archival storage.
- Modular design: discover, query and reporting are separated.  Can move
towards distributed model.
- Built for crontab.
- Can clear counters on query or when pegged.
- Keeps historical performance and topoloy data
- Gathers and stores most of the IB tables:
        nodeinfo, switchinfo, sminfo, portinfo, perfcounters, lfdb
(optional)
- Reports changes in SMs

Known issues:
- Does not receive SM traps, needs to rediscover every so often.
- Threshold values for errors need to be moved to a config file, currently
in a db.
- Does not clear counters when "nearly" pegged.

Todd
On 11 Jul 2007 12:21:51 -0400, Hal Rosenstock <halr at voltaire.com> wrote:
>
> On Wed, 2007-07-11 at 11:00, Mark Seger wrote:
> > Hal Rosenstock wrote:
> >
> > >On Wed, 2007-07-11 at 10:15, Mark Seger wrote:
> > >
> > >
> > >>My basic philosophy, and I suspect there are those who might disagree,
> > >>is that you can't use the network to monitor the network, at least not
> > >>in times of trouble.
> > >>
> > >>
> > >
> > >Right, in times of certain troubles.
> > >
> > >
> > and that is the key.  since you can't know apriori when you're about to
> > have troubles, you need to be collecting the data locally before they
> occur.
> >
> > >>That's why I insist on having to query the HCAs
> > >>directly since I can't always be sure the network is there and/or
> > >>reliable.  If you are willing to concede that this can indeed happen
> > >>than the question becomes one of how do you reliably get data from an
> > >>HCA and that's the basis for my (re)starting this discussion.
> > >>
> > >>
> > >
> > >The reliability comes from timeout/retry mechanisms. If performance
> data
> > >cannot be obtained on an IB network, it needs to be trouble shooted at
> a
> > >lower level (by SMPs).
> > >
> > >In any case, a rearchitecture of the PMA was proposed and seems
> > >reasonable to me in that it can accomodate either approach. All that is
> > >needed now is for someone to step up and champion an implementation of
> > >this. Unfortunately, I do not have time to do so.
> > >
> > >
> > I don't know if what I've been proposing requires any rearchitecting as
> > I see is as something local to each node.  Specificially, and there is
> > already an implementation of this in an earlier voltaire stack, is to
> > export wrapping HCA counters to /proc.  The module that does this
> > read/clears the counters on every access but since no local applications
> > are accessing the counters directly, clearing them doesn't hurt anyone.
> > Alas, anyone else who wants to query the counters will find them reset.
>
> No local application but perhaps a remote one. This is the reason for
> the proposed rearchitecture (along with synthesizing the wider
> counters).
>
> -- Hal
>
> > The other side benefit of exporting these counters is such a way is now
> > lots of others can collect/report this info.  In other words is someone
> > chose to add IB stats to sar, it would become very easy to do!
> >
> > If this is the type of thing people are interested in, I might be able
> > to supply some code to do it.
> >
> > >>As for querying the switch for counters, what do you do on a very
> large
> > >>network, say 10s of thousands of nodes if you want to get performance
> > >>data every second?  I also realize this is an extreme situation today
> > >>(the node count not the frequency of monitoring) but I'm sure everyone
> > >>would agree systems of these sizes are not that far off.
> > >>
> > >>
> > >
> > >You have a distributed performance manager to handle this. A hierarchy
> > >of performance managers has been discussed on the list before.
> > >
> > >
> > ahh, I see.
> > -mark
> >
> >
>
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-general
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070712/e9fbe1a1/attachment.html>


More information about the general mailing list