[ofa-general] Re: [PATCH] opensm & osm_console: modified console framework to support multiple connections

Mon Oct 29 10:48:47 PDT 2007

Hi Sasha,

I apologize for the style and submission issues - still adjusting...
Some of the design issues/objectives are discussed in-line below.

Sasha Khapyorsky wrote:
> Hi Tim,
>
> Sorry about very long delay with reviewing this.
>
> On 16:52 Mon 15 Oct     , Timothy A. Meier wrote:
>   
>>   This patch is setting up for adding Remote/Secure Console capability using 
>>  SSL/TSL (we need at LLNL).
>>     
>
> Thanks for doing this - it is great thing to secure OpenSM console.
>
>   
>>   Its a big patch because I changed to an abstract server model, instead of 
>>  the original
>>  single connection and synchronous model.  There is no significant functional 
>>  difference (yet).
>>     
>
> It is hard to understand how such abstraction model serves us without
> seeing the rest of SSL/TSL code. Probably it is better idea to issue
> whole patch series? Anyway some initial comments are below.
>
>   
I understand. This patch is fundamentally about changing the architecture to
support new features and capabilities (without actually providing 
anything new).

Adding SSL/TSL was driving the requirements for most of these changes, 
but once
changed, has wider applications. I wanted to keep it abstract.

The first "new" feature will be SSL/TSL for a secure remote console.
>>  ========
>>  From cb69c1e2c8ea526bcb1e81d079bfa787eda09ba8 Mon Sep 17 00:00:00 2001
>>  From: Tim Meier <meier3 at llnl.gov>
>>  Date: Mon, 15 Oct 2007 16:08:10 -0700
>>  Subject: [PATCH] opensm & osm_console: modified console framework to support 
>>  multiple connections
>>
>>  Provided an abstract console service that supports the current connection 
>>  types
>>  (local, loopback, socket) as well as supporting the addition of a secure
>>  connection type.
>>
>>  * A server implementation supports multiple connections, and reduces the
>>  posibility of an inadvertant denial of service (currently vulnerable).
>>
>>  * An IO abstraction (CIO) is employed to facilitate the future 
>>  implementation
>>  of a secure socket (SSL / TSL) connection, while maintaining backward
>>  compatibility.
>>     
>
> Would be nice to not mix two things in one patch - "one patch per
> thought" makes it easier to review and submit.
>
>   
I was troubled with breaking this into pieces. The patch is really about
providing an abstract OSM Server that supports local/remote connections.

I can break them up, but in my mind, they were tightly coupled.
>>  Signed-off-by: Tim Meier <meier3 at llnl.gov>
>>  ---
>>  opensm/include/opensm/osm_console.h |   35 +-
>>  opensm/opensm/main.c                |   77 ++-
>>  opensm/opensm/osm_console.c         | 1500 
>>  +++++++++++++++++++++++++----------
>>  3 files changed, 1177 insertions(+), 435 deletions(-)
>>
>>  diff --git a/opensm/include/opensm/osm_console.h 
>>  b/opensm/include/opensm/osm_console.h
>>  index 33e41e7..75111a4 100644
>>  --- a/opensm/include/opensm/osm_console.h
>>  +++ b/opensm/include/opensm/osm_console.h
>>  @@ -49,6 +49,14 @@
>>  #define OSM_DEFAULT_CONSOLE      OSM_DISABLE_CONSOLE
>>  #define OSM_DEFAULT_CONSOLE_PORT 10000
>>  #define OSM_DAEMON_NAME          "opensm"
>>  +#define OSM_QUIT_CMD             "quit"
>>  +#define OSM_LOOP_PERIOD_SEC      2
>>  +
>>  +#define CIO_BUFSIZE          1024
>>  +#define CIO_INFO_SIZE         128
>>  +#define CIO_NOTE_SIZE          64
>>  +#define CIO_MAX_CONNECTS        5
>>  +#define CIO_CONNECTION_PORT 10000
>>  #ifdef __cplusplus
>>  #  define BEGIN_C_DECLS extern "C" {
>>  @@ -59,10 +67,29 @@
>>  #endif                /* __cplusplus */
>>  BEGIN_C_DECLS
>>  -void osm_console_init(osm_subn_opt_t * opt, osm_opensm_t * p_osm);
>>  -void osm_console(osm_opensm_t * p_osm);
>>  -void osm_console_prompt(FILE * out);
>>  -void osm_console_close_socket(osm_opensm_t * p_osm);
>>  +
>>  +/* TODO move when fully implemented */
>>  +typedef struct _CIO_t
>>  +{
>>  +  int  fd;  // file descriptor (socket)
>>  +  FILE *out;
>>  +  FILE *err;
>>  +  FILE *in;
>>  +  struct pollfd *pfd;
>>  +} CIO_t;
>>  +
>>  +int osm_console_server(osm_subn_opt_t *p_opt, osm_opensm_t *p_osm);
>>  +void osm_console_server_init(osm_subn_opt_t *opt, osm_opensm_t *p_osm);
>>  +void osm_console_server_destroy(osm_opensm_t *p_osm);
>>  +int is_console_enabled(osm_subn_opt_t *p_opt);
>>  +
>>  +/* TODO move along with other IO abstraction code */
>>  +int cio_printf( CIO_t *cio, const char *format, ...);
>>  +int cio_flush( CIO_t *cio);
>>  +int cio_getline( char **lineptr, size_t *n, CIO_t *cio);
>>  +int cio_open( CIO_t *cio);
>>  +int cio_close( CIO_t *cio);
>>  +int cio_poll(CIO_t *cio, int timeout);
>>     
>
> Later I see that all cio_* and CIO_* stuff is used only in
> osm_console.c, then I think this all should be moved to this file,
> local function should be static, etc..
>
>   
The intent of the CIO abstraction is to support connections to the OSM
server. Currently, the only thing "planned" to use this connection is
the interactive Console. That might not always be the case.
> Another thing, please try to not break existing coding style (it is
> described in opensm/doc/opensm-coding-style.txt), in many cases you can
> use opensm/opensm/osm_indent script to format the code.
>
>   
Sorry. I wasn't aware of the indent script until recently. Is this
universally used, or just on new code?
>>  #include <opensm/osm_perfmgr.h>
>>  +typedef struct _LoopCmd
>>  +{
>>  +  int    on;
>>  +  int    running;
>>  +  int    delay_s;
>>  +  void (*loop_function)(osm_opensm_t *p_osm, CIO_t *out);
>>  +  cl_thread_t  loopThread; // a specific thread for each looping cmd
>>  +} LoopCmd;
>>  +
>>  +// unique attributes for each connection
>>  +typedef struct _osm_console_thread_t
>>  +{
>>  +  int used;
>>  +  unsigned short int port;
>>  +  int authorized;
>>  +  int state;
>>  +  char name[CIO_INFO_SIZE];
>>  +  char in_buff[CIO_BUFSIZE];
>>  +  char out_buff[CIO_BUFSIZE];
>>  +  char client_type[CIO_NOTE_SIZE];  // maps to option->console 
>>  (off|local|socket)
>>  +  char client_ip[CIO_NOTE_SIZE];
>>  +  char client_hn[CIO_INFO_SIZE];
>>  +  unsigned int thread_num;  // a unique ever increasing number +  
>>  osm_opensm_t *p_osm;     // the global opensm singleton (protect with lock)
>>  +  CIO_t io;                // the io streams for the connection
>>  +  LoopCmd loop_command;
>>  +  cl_thread_t  consoleThread; // a specific thread each console connection
>>  +  struct timeval connect_time;
>>  +} osm_console_thread_t;
>>     
>
> I think this introduces CIO_MAX_CONNECTS new threads + for loop commands.
> What about to do all in one thread - to use select() or poll() with
> timeout on multiple file descriptors? This will "reserve" another CPUs
> for running another OpenSM things. Another potential problem is multi
> thread synchronizations - we had (and still have) a lot of issues in this
> area.
>
>   
I wasn't aware of thread synchronization issues....

You are correct, this potentially introduces 2*CIO_MAX_CONNECTS new threads.
(Worst case, all connections are used, all running a loop command.)

Currently, the only loop command is for printing status, but the software
was designed to support any command you may want to put in a
loop. If no additional commands will be "looped", then I agree its overkill
to put this in its own thread.

I think each connection/session should be in its own thread.
>>  +
>>  struct command {
>>  -    char *name;
>>  -    void (*help_function) (FILE * out, int detail);
>>  -    void (*parse_function) (char **p_last, osm_opensm_t * p_osm,
>>  -                FILE * out);
>>  +  char *name;
>>  +  void (*help_function)(CIO_t *out, int detail);
>>  +  void (*parse_function)(char **p_last, osm_console_thread_t *p_oct, CIO_t 
>>  *out);
>>  };
>>  -struct {
>>  -    int on;
>>  -    int delay_s;
>>  -    time_t previous;
>>  -    void (*loop_function) (osm_opensm_t * p_osm, FILE * out);
>>  -} loop_command = {
>>  -on: 0, delay_s: 2, loop_function:NULL};
>>  +/* connection pool for remote clients - currently only consoles */
>>  +static osm_console_thread_t ConsoleThreadPool[CIO_MAX_CONNECTS];
>>  +static cl_plock_t ThreadLock;
>>  +static volatile unsigned int cio_thread_counter = 0;
>>  +static struct timeval ServerTime;
>>     
>
> Would be nice to avoid using non-constant static/global variables.
> Instead we could keep needed per OpenSM session info in allocated
> structure.
>
>   
I agree. I was following the existing code and limiting the # of connections
to a small number. I didn't think it was different than current practice.

Allocating the structure would be a better long term solution.
>>  +
>>  +/**********************************************************************
>>  + * convenience function
>>  + **********************************************************************/
>>  +CIO_t* getCIO(osm_console_thread_t *oct)
>>     
>
> This function should be static?
>
>   
Yep, I'll fix those.
>>  +{
>>  +  return &oct->io;
>>  +}
>>  +
>>  +/**********************************************************************
>>  + * thread pool primitive: counts the number currently in use
>>  + **********************************************************************/
>>  +int num_console_threads(void)
>>     
>>  -static void loglevel_parse(char **p_last, osm_opensm_t * p_osm, FILE * out)
>>  +static void loglevel_parse(char **p_last, osm_console_thread_t *p_oct, 
>>  CIO_t *out)
>>  {
>>  +  osm_opensm_t *p_osm = p_oct->p_osm;
>>      char *p_cmd;
>>      int level;
>>      p_cmd = next_token(p_last);
>>      if (!p_cmd)
>>  -        fprintf(out, "Current log level is 0x%x\n",
>>  -            osm_log_get_level(&p_osm->log));
>>  +        cio_printf(out, "Current log level is 0x%x\n", 
>>  osm_log_get_level(&p_osm->log));
>>     
>
> At least here your mailer wraps the line :(
>
>   
I see, sorry - I thought I turned that off. I will switch to a different 
mailer for sending patches.
>>  +
>>  +int print_console_thread_pool(osm_console_thread_t* p_oct, osm_opensm_t 
>>  *p_osm, CIO_t *out)
>>     
>
> This function is not used.
>
>   
Whoops! I removed the "new" command that uses this (didn't want to introduce
too many new things) but missed this code.
>>     
>
> It is not clear for me why most of those wrapper functions are needed
> at all. And how really so big comment about *_printf() usage is helpful.
>
> Sasha
>
>   
Currently those wrapper functions only provide a single implementation, but
I intend to extend them with additional functionality when I add SSL/TSL.
The new protocol will depend on new libraries/headers. We (LLNL)
discussed this, and thought conditionally compiling this feature in would
satisfy those folks who did not want to add this dependency if they did
not want the feature.

So the wrapper functions (in this Patch) were just a way of introducing
the IO abstraction.

Regarding the comments, sorry if it seems verbose. I tend to put all of
my documentation in the code, and sometimes I get carried away.

Thanks for reviewing all of this. How would you like me to move forward?
Would you rather me (re)submit this Patch as a series of 2? I want to
establish this as a working baseline (no new functionality, just more
extensible) before adding the SSL/TSL code.

-- 
Timothy A. Meier
Computer Scientist
ICCD/High Performance Computing
925.422.3341
meier3 at llnl.gov