[ofa-general] [OpenSM][RFC] OpenSM Proposed Perf Manager

Mon May 14 09:55:53 PDT 2007

On Sun, 13 May 2007 22:55:39 +0300
Sasha Khapyorsky <sashak at voltaire.com> wrote:

> Hi Ira,
> 
> Thanks for the great work!
> 
> On 18:49 Tue 08 May     , Ira Weiny wrote:
> > I would like to submit to the list a performance manager which I have been
> > working on for OpenSM.
> > 
> > It is implemented as the first proposed architecture model set forth by Hal (As
> > an integrated thread to OpenSM.)  As such it works fine on our small test
> > cluster but there is some concern about its scalability.
> > 
> > I have extended this architecture with an idea of my own.  This idea is to have
> > a plug-able module for the "event database".  With this interface one could
> > write their own Data reduction, logging, and tracking methods.  Here at LLNL I
> > propose to use this to add counter and subnet events directly to our management
> > database which is used to show system status to our operators.  Other
> > installations might prefer other methods of logging, SNMP for example.  This
> > patch includes a "reference" implementation of this "event database" which
> > stores the information internally until the user requests a "dump".
> 
> I like this event db idea, but not sure this should not be integral part
> of the low level perfmgr stuff - as it is currently implemented without
> such plugin loaded PerfMgr just doesn't work - this unconditionally tries
> to pull all ports counters, but has nothing to do with it without plugin.
> 
> Instead I would purpose to have a builtin PerfMgr which will be able to
> pull and store performance related data and then to call "generic" event
> manager which can process such data. This also will help to have simpler
> generic API for such event db plugin so other parts of OpenSM will be
> able to report events using same method(s). What do you think?

This is a good idea.  I will think about how to make it work.

<snip>

> > +
> > +/**
> > + * group port counters for ports into the nodes
> > + */
> > +typedef struct _osm_pc_node {
> > +	cl_map_item_t  map_item; /* must be first */
> > +	uint64_t       node_guid;
> > +	osm_event_pc_t   *ports;
> > +	uint8_t        num_ports;
> > +} osm_pc_node_t;
> 
> Is it really needed to keep osm_pc_node_t nodes in separate db (qmap)?
> Why not to reuse already existed maps in osm_subn_t (we could add
> 'void *pm_data' or so field to osm_physp_t structure)?
> 

I did not want to complicate the SM data structures.  Also these structures
were part of the plugin.  This reference plugin used the compatibility lib qmap
structures to store the data.  But other plugins may use SQL or other data
stores.  I think I agree with Hal that we should keep these data structures
separate from the SM structures.

> > +
> > +/****s* OpenSM: PERFMGR/osm_perfmgr_state_t */
> > +typedef enum
> > +{
> > +  PERFMGR_STATE_DISABLE,
> > +  PERFMGR_STATE_ENABLED,
> > +  PERFMGR_STATE_NO_DB
> 
> Why PERFMGR_STATE_NO_DB is needed? Isn't is duplicated by
> (pm->db == NULL)?
> 
> As side effect of this duplication - now when DB was not found I can
> enable perfmgr with console command, but it obviously crashes during
> follow 'dump'.

Ah I did not catch that.

If we separate out the plugin to be generic with a perfmgr internal store, this
will go away.  I did add checks for NULL DB functions so that plugins could
decide to not receive some types of data, but this only makes sense with the
refactoring I did on the DB interface.

<snip>

> 
> > +} osm_perfmgr_state_t;
> > +
> > +/****s* OpenSM: PERFMGR/osm_perfmgr_t
> > +*  This object should be treated as opaque and should
> > +*  be manipulated only through the provided functions.
> > +*/
> > +typedef struct _osm_perfmgr
> > +{
> > +  osm_thread_state_t    thread_state;
> > +  cl_event_t            sig_sweep;
> > +  cl_thread_t           sweeper;
> > +  osm_subn_t           *subn;
> > +  osm_sm_t             *sm;
> > +  cl_plock_t           *lock;
> > +  osm_log_t            *log;
> > +  osm_mad_pool_t       *mad_pool;
> > +  atomic32_t            trans_id;
> 
> Do we need separate transaction id generator for PerfMgr? 

Probably not here but if we separate out the perfmgr we might.

<snip>

> > +
> > +/****f* OpenSM: PERFMGR/osm_perfmgr_init */
> > +ib_api_status_t
> > +osm_perfmgr_init(
> > +	osm_perfmgr_t* const perfmgr,
> > +	osm_subn_t* const subn,
> > +        osm_sm_t * const sm,
> > +	osm_log_t* const log,
> > +	osm_mad_pool_t * const mad_pool,
> > +	osm_vendor_t * const vendor,
> > +        cl_dispatcher_t* const disp,
> > +   	cl_plock_t* const lock,
> > +	const osm_subn_opt_t * const p_opt );
> 
> The identation is not unified (tab character is preferred) here and in
> another places, also there are lot of trailing white spaces in the patch.
> You can run 'git-diff --color' to see formatting issues.

Yes, sorry.  I have been trying to follow the new codeing standard but I have
not done a great job.  Thanks for the git tip.

<snip>

> >  
> > +#ifdef ENABLE_OSM_PERF_MGR
> > +    case 1:
> > +      opt.perfmgr = TRUE;
> > +      break;
> > +    case 2:
> > +      opt.perfmgr_sweep_time_s = atoi(optarg);
> 
> In case of user error we can get opt.perfmgr_sweep_time_s = 0 (or another
> strange value), I think at least minimal verification is needed here.

Yes, good catch.  I am actually going to remove these from the command line
options.  I think one can control these better from the opensm.opts file.
There seems to be too many options which must be set for this to work correctly
right now.  Also what would you guys think of having a separate perfmgr config
file?  I am not sure about that idea.  On one hand it helps to keep the
opensm.opts file clean but on the other hand it means the user has to deal with
another config file.  :-/

<snip>

> > +
> > +/**********************************************************************
> > + * Process errors from the MAD send.
> > + **********************************************************************/
> > +static void
> > +osm_perfmgr_mad_send_err_callback(void* bind_context, osm_madw_t *p_madw)
> > +{
> > +	osm_perfmgr_t *pm = (osm_perfmgr_t *)bind_context;
> > +	osm_madw_context_t *context = &(p_madw->context);
> > +	
> > +	OSM_LOG_ENTER( pm->log, osm_pm_mad_send_err_callback );
>                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> Ditto (the same for another perfmgr functions)

Yep, I already caught these, thanks.

<snip>

> > +	
> > +	OSM_LOG_ENTER( pm->log, __osm_pm_query_counters );
> > +	
> > +	memcpy(node_desc, p_node->node_desc.description,
> > +			IB_NODE_DESCRIPTION_SIZE);
> > +	node_desc[IB_NODE_DESCRIPTION_SIZE-1] = '\0';
> 
> We have null terminated 'print_desc' field in osm_node_t structure

Yea, I put it there, I should have known that ;-)  I changed this already...

<snip>

Thanks for the comments.  I will get a framework done for the general event
plugin...  I do agree that would be better for other types of events.  My
original idea was to have these events reported to the "perfmgr".  But that is
somewhat invasive on the perfmgr object.

I am not sure what the best way to do this is at the moment.  I have cleaned up
the DB interface quite a bit, including making it more generic.  So I think
this might fit in nicely.  I can reissue a patch if you would like to see it.
Or I can just submit the header file to see the interface.

Thanks,
Ira