[ofa-general] Re: [PATCH] opensm & osm_console: modified console framework to support multiple connections
Timothy A. Meier
meier3 at llnl.gov
Thu Nov 1 10:53:22 PDT 2007
Sasha,
I have some in-line comments, but the main "threading issue" discussion is
near the end. Please advise.
Sasha Khapyorsky wrote:
> On 10:48 Mon 29 Oct , Timothy A. Meier wrote:
>
>> I apologize for the style and submission issues - still adjusting...
>>
>
> No need to apologize :)
>
>
>> I was troubled with breaking this into pieces. The patch is really about
>> providing an abstract OSM Server that supports local/remote connections.
>>
>> I can break them up, but in my mind, they were tightly coupled.
>>
>
> I think it could be broken at least to multiconnection support and the
> rest abstractions. No need to split it now only for "split", just try
> to make it in smaller patches in the next version of this.
>
>
>>>> +/* TODO move along with other IO abstraction code */
>>>> +int cio_printf( CIO_t *cio, const char *format, ...);
>>>> +int cio_flush( CIO_t *cio);
>>>> +int cio_getline( char **lineptr, size_t *n, CIO_t *cio);
>>>> +int cio_open( CIO_t *cio);
>>>> +int cio_close( CIO_t *cio);
>>>> +int cio_poll(CIO_t *cio, int timeout);
>>>>
>>>>
>>> Later I see that all cio_* and CIO_* stuff is used only in
>>> osm_console.c, then I think this all should be moved to this file,
>>> local function should be static, etc..
>>>
>>>
>>>
>> The intent of the CIO abstraction is to support connections to the OSM
>> server. Currently, the only thing "planned" to use this connection is
>> the interactive Console. That might not always be the case.
>>
>
> Now it is the case. And if there are no concrete plans to use this APIs
> externally I prefer to keep it local.
>
>
Okay, well I quoted the "planned" because I/we (LLNL) have some ideas (not
really plans) we would like to try that will use this abstraction.
Keeping it local, until needed elsewhere is fine.
>>>> +typedef struct _osm_console_thread_t
>>>> +{
>>>> + int used;
>>>> + unsigned short int port;
>>>> + int authorized;
>>>> + int state;
>>>> + char name[CIO_INFO_SIZE];
>>>> + char in_buff[CIO_BUFSIZE];
>>>> + char out_buff[CIO_BUFSIZE];
>>>> + char client_type[CIO_NOTE_SIZE]; // maps to option->console
>>>> (off|local|socket)
>>>> + char client_ip[CIO_NOTE_SIZE];
>>>> + char client_hn[CIO_INFO_SIZE];
>>>> + unsigned int thread_num; // a unique ever increasing number +
>>>> osm_opensm_t *p_osm; // the global opensm singleton (protect with
>>>> lock)
>>>> + CIO_t io; // the io streams for the connection
>>>> + LoopCmd loop_command;
>>>> + cl_thread_t consoleThread; // a specific thread each console
>>>> connection
>>>> + struct timeval connect_time;
>>>> +} osm_console_thread_t;
>>>>
>>>>
>>> I think this introduces CIO_MAX_CONNECTS new threads + for loop commands.
>>> What about to do all in one thread - to use select() or poll() with
>>> timeout on multiple file descriptors? This will "reserve" another CPUs
>>> for running another OpenSM things. Another potential problem is multi
>>> thread synchronizations - we had (and still have) a lot of issues in this
>>> area.
>>>
>>>
>>>
>> I wasn't aware of thread synchronization issues....
>>
>> You are correct, this potentially introduces 2*CIO_MAX_CONNECTS new threads.
>> (Worst case, all connections are used, all running a loop command.)
>>
>> Currently, the only loop command is for printing status, but the software
>> was designed to support any command you may want to put in a
>> loop. If no additional commands will be "looped", then I agree its overkill
>> to put this in its own thread.
>>
>> I think each connection/session should be in its own thread.
>>
>
> Wouldn't poll() on multiple file descriptors (connected and listened
> sockets) be simpler and more robust approach here? Why?
>
>
See the thread/poll discussion below..
>> Currently those wrapper functions only provide a single implementation, but
>> I intend to extend them with additional functionality when I add SSL/TSL.
>>
>
> This is why I thought it would be clearer to see in a patch series..
>
>
Understood. Abstractions are kind of.... abstract. Its hard to see the
justification for an abstraction layer without having at least two different
implementations. I provide one. The second one will be SSL/TSL. I'd like
to provide that after the new framework/abstractions are in place and
working
just as before.
>> The new protocol will depend on new libraries/headers. We (LLNL)
>> discussed this, and thought conditionally compiling this feature in would
>> satisfy those folks who did not want to add this dependency if they did
>> not want the feature.
>>
>
> That should be fine.
>
>
>> Thanks for reviewing all of this. How would you like me to move forward?
>> Would you rather me (re)submit this Patch as a series of 2?
>>
>
> I think we need to close threading issue first. Then patch series of 2
> looks fine for me.
>
>
I really think the "thread-per-session" would be a more flexible and
powerful
design. Setting up and maintaining threads might seem more complex at first,
but it makes servicing requests/commands much more simple because
everything is
in its own context.
The previous Console used a polling mechanism, and I found an edge case
condition
which allowed one connection to block the other. Thread-per-session (or
thread
per connection) makes it difficult for one session to influence another.
The number of threads/connections would be limited. Other than the normal
multi-threading issues, are there other thread hazards in OFED/OpenSM that I
need to be aware of?
Your thoughts?
>> I want to
>> establish this as a working baseline (no new functionality, just more
>> extensible) before adding the SSL/TSL code.
>>
>
> Understood. Thanks for doing this!
>
> Sasha
>
>
--
Timothy A. Meier
Computer Scientist
ICCD/High Performance Computing
925.422.3341
meier3 at llnl.gov
More information about the general
mailing list