[ofa-general] Re: [PATCH] opensm & osm_console: modified console framework to support multiple connections

Thu Nov 1 10:53:22 PDT 2007

Sasha,

I have some in-line comments, but the main "threading issue" discussion is
near the end. Please advise.

Sasha Khapyorsky wrote:
> On 10:48 Mon 29 Oct     , Timothy A. Meier wrote:
>   
>>  I apologize for the style and submission issues - still adjusting...
>>     
>
> No need to apologize :)
>
>   
>>  I was troubled with breaking this into pieces. The patch is really about
>>  providing an abstract OSM Server that supports local/remote connections.
>>
>>  I can break them up, but in my mind, they were tightly coupled.
>>     
>
> I think it could be broken at least to multiconnection support and the
> rest abstractions. No need to split it now only for "split", just try
> to make it in smaller patches in the next version of this.
>
>   
>>>>  +/* TODO move along with other IO abstraction code */
>>>>  +int cio_printf( CIO_t *cio, const char *format, ...);
>>>>  +int cio_flush( CIO_t *cio);
>>>>  +int cio_getline( char **lineptr, size_t *n, CIO_t *cio);
>>>>  +int cio_open( CIO_t *cio);
>>>>  +int cio_close( CIO_t *cio);
>>>>  +int cio_poll(CIO_t *cio, int timeout);
>>>>     
>>>>         
>>> Later I see that all cio_* and CIO_* stuff is used only in
>>> osm_console.c, then I think this all should be moved to this file,
>>> local function should be static, etc..
>>>
>>>   
>>>       
>>  The intent of the CIO abstraction is to support connections to the OSM
>>  server. Currently, the only thing "planned" to use this connection is
>>  the interactive Console. That might not always be the case.
>>     
>
> Now it is the case. And if there are no concrete plans to use this APIs
> externally I prefer to keep it local.
>
>   
Okay, well I quoted the "planned" because I/we (LLNL) have some ideas (not
really plans) we would like to try that will use this abstraction.

Keeping it local, until needed elsewhere is fine.
>>>>  +typedef struct _osm_console_thread_t
>>>>  +{
>>>>  +  int used;
>>>>  +  unsigned short int port;
>>>>  +  int authorized;
>>>>  +  int state;
>>>>  +  char name[CIO_INFO_SIZE];
>>>>  +  char in_buff[CIO_BUFSIZE];
>>>>  +  char out_buff[CIO_BUFSIZE];
>>>>  +  char client_type[CIO_NOTE_SIZE];  // maps to option->console  
>>>> (off|local|socket)
>>>>  +  char client_ip[CIO_NOTE_SIZE];
>>>>  +  char client_hn[CIO_INFO_SIZE];
>>>>  +  unsigned int thread_num;  // a unique ever increasing number +   
>>>> osm_opensm_t *p_osm;     // the global opensm singleton (protect with 
>>>> lock)
>>>>  +  CIO_t io;                // the io streams for the connection
>>>>  +  LoopCmd loop_command;
>>>>  +  cl_thread_t  consoleThread; // a specific thread each console 
>>>> connection
>>>>  +  struct timeval connect_time;
>>>>  +} osm_console_thread_t;
>>>>     
>>>>         
>>> I think this introduces CIO_MAX_CONNECTS new threads + for loop commands.
>>> What about to do all in one thread - to use select() or poll() with
>>> timeout on multiple file descriptors? This will "reserve" another CPUs
>>> for running another OpenSM things. Another potential problem is multi
>>> thread synchronizations - we had (and still have) a lot of issues in this
>>> area.
>>>
>>>   
>>>       
>>  I wasn't aware of thread synchronization issues....
>>
>>  You are correct, this potentially introduces 2*CIO_MAX_CONNECTS new threads.
>>  (Worst case, all connections are used, all running a loop command.)
>>
>>  Currently, the only loop command is for printing status, but the software
>>  was designed to support any command you may want to put in a
>>  loop. If no additional commands will be "looped", then I agree its overkill
>>  to put this in its own thread.
>>
>>  I think each connection/session should be in its own thread.
>>     
>
> Wouldn't poll() on multiple file descriptors (connected and listened
> sockets) be simpler and more robust approach here? Why?
>
>   
See the thread/poll discussion below..
>>  Currently those wrapper functions only provide a single implementation, but
>>  I intend to extend them with additional functionality when I add SSL/TSL.
>>     
>
> This is why I thought it would be clearer to see in a patch series..
>
>   
Understood. Abstractions are kind of.... abstract. Its hard to see the
justification for an abstraction layer without having at least two different
implementations. I provide one. The second one will be SSL/TSL. I'd like
to provide that after the new framework/abstractions are in place and 
working
just as before.
>>  The new protocol will depend on new libraries/headers. We (LLNL)
>>  discussed this, and thought conditionally compiling this feature in would
>>  satisfy those folks who did not want to add this dependency if they did
>>  not want the feature.
>>     
>
> That should be fine.
>
>   
>>  Thanks for reviewing all of this. How would you like me to move forward?
>>  Would you rather me (re)submit this Patch as a series of 2?
>>     
>
> I think we need to close threading issue first. Then patch series of 2
> looks fine for me.
>
>   
I really think the "thread-per-session" would be a more flexible and 
powerful
design. Setting up and maintaining threads might seem more complex at first,
but it makes servicing requests/commands much more simple because 
everything is
in its own context.

The previous Console used a polling mechanism, and I found an edge case 
condition
which allowed one connection to block the other. Thread-per-session (or 
thread
per connection) makes it difficult for one session to influence another.

The number of threads/connections would be limited. Other than the normal
multi-threading issues, are there other thread hazards in OFED/OpenSM that I
need to be aware of?

Your thoughts?
>> I want to
>>  establish this as a working baseline (no new functionality, just more
>>  extensible) before adding the SSL/TSL code.
>>     
>
> Understood. Thanks for doing this!
>
> Sasha
>
>   

-- 
Timothy A. Meier
Computer Scientist
ICCD/High Performance Computing
925.422.3341
meier3 at llnl.gov