[ofa-general] [OpenSM][RFC] OpenSM Proposed Perf Manager

Ira Weiny weiny2 at llnl.gov
Mon May 14 10:02:24 PDT 2007


On 14 May 2007 06:58:34 -0400
Hal Rosenstock <halr at voltaire.com> wrote:

> On Sun, 2007-05-13 at 15:55, Sasha Khapyorsky wrote:
> > Hi Ira,
> > 
> > Thanks for the great work!
> 
> Indeed :-)
> 
> > On 18:49 Tue 08 May     , Ira Weiny wrote:
> > > I would like to submit to the list a performance manager which I have been
> > > working on for OpenSM.
> > > 
> > > It is implemented as the first proposed architecture model set forth by Hal (As
> > > an integrated thread to OpenSM.)  As such it works fine on our small test
> > > cluster but there is some concern about its scalability.
> > > 
> > > I have extended this architecture with an idea of my own.  This idea is to have
> > > a plug-able module for the "event database".  With this interface one could
> > > write their own Data reduction, logging, and tracking methods.  Here at LLNL I
> > > propose to use this to add counter and subnet events directly to our management
> > > database which is used to show system status to our operators.  Other
> > > installations might prefer other methods of logging, SNMP for example.  This
> > > patch includes a "reference" implementation of this "event database" which
> > > stores the information internally until the user requests a "dump".
> > 
> > I like this event db idea, but not sure this should not be integral part
> > of the low level perfmgr stuff - as it is currently implemented without
> > such plugin loaded PerfMgr just doesn't work - this unconditionally tries
> > to pull all ports counters, but has nothing to do with it without plugin.
> > 
> > Instead I would purpose to have a builtin PerfMgr which will be able to
> > pull and store performance related data and then to call "generic" event
> > manager which can process such data. This also will help to have simpler
> > generic API for such event db plugin so other parts of OpenSM will be
> > able to report events using same method(s). What do you think?
> 
> Sounds better to me. Ira ?

Yes, except that I am concerned with storing the data in the perfmgr as well as
the plugin.  But I like the idea of a more generic plugin for getting events
from OSM.  My mind is already full of ideas after responding to Sasha...  ;-)

<snip>

> > > +
> > > +/**
> > > + * group port counters for ports into the nodes
> > > + */
> > > +typedef struct _osm_pc_node {
> > > +	cl_map_item_t  map_item; /* must be first */
> > > +	uint64_t       node_guid;
> > > +	osm_event_pc_t   *ports;
> > > +	uint8_t        num_ports;
> > > +} osm_pc_node_t;
> > 
> > Is it really needed to keep osm_pc_node_t nodes in separate db (qmap)?
> > Why not to reuse already existed maps in osm_subn_t (we could add
> > 'void *pm_data' or so field to osm_physp_t structure)?
> 
> My one concern would be evolving the PerfMgr. This is better now but is
> this better when the PerfMgr is separated from the SM functionality ? I
> know there are other things to untangle to get there.
> 

I fully agree.  I don't think we want intertwine the SM structures with the
PerfMgr structures.  BTW in the new code I have this is named _db_node_t.

Ira



More information about the general mailing list