[ofa-general] Re: [PATCH] Opensm: main exit codes
Timothy A. Meier
meier3 at llnl.gov
Mon Nov 24 14:03:38 PST 2008
Hi Sasha,
I guess I viewed this patch as just cleaning up the interface between the program and the system.
Sasha Khapyorsky wrote:
> On 10:34 Mon 24 Nov , Timothy A. Meier wrote:
>> Hi Sasha,
>>
>> Sasha Khapyorsky wrote:
>>> Hi Tim,
>>>
>>> On 17:10 Tue 18 Nov , Timothy A. Meier wrote:
>>>> I thought it would be useful to define a set of exit codes for opensm. A quick examination of main.c
>>>> showed a few different ways to terminate. How about this patch? Obviously this doesn't catch every
>>>> possible exit scenario, but its a start that can be built upon.
>>> Personally I read 'exit(0)' faster than 'exit(OSM_EXIT_TYPE_NORMAL)',
>>> but maybe it is just me :).
>> Me too :^) Not much confusion over a return code of 0.
>>
>> The audience for this change wouldn't be the people writing the software,
>
> Somehow we need to care about yourselves too :)
>
>> but admins, scripts, and tools that
>> start/stop/monitor opensm. At least that is our use case.
>>
>>> Maybe error codes could be formalized, but I'm not sure that it would be
>>> beneficial without any practical uses (and clear requirements
>>> understanding). Finally we can found us in a middle of the total mess
>>> similar to how OSM_LOG_* is used today.
>>>
>>> Sasha
>>>
>> So the uses/requirements would be to formalize how opensm handles the non-ideal termination condition,
>> for the purpose of providing quick, convenient, and consistent information for other system level tools
>> that are responsible for starting/stopping/monitoring/reporting opensm.
>
> And are there any of such tools? Or any *real* use?
>
Chicken/Egg? Currently, we depend on only ZERO or non-zero. Although OpenSM returns "other" values
on exit, they aren't really formalized or documented. Hence the patch. ;^)
Personally, I have (and create) several different versions of opensm with small customizations,
and test them on our cluster testbeds. I often will start/stop them in a variety of configurations
(with and without plugins, more than one sm on a node, etc.) and if and when opensm doesn't
startup normally, it would be nice to have a meaningful exit code.
Perhaps others might find it useful as well, or for some future use.
But again, I originally considered this more as code cleanup. Converting the exits, returns, and aborts
to provide a more consistent interface to the system.
--
Timothy A. Meier
Computer Scientist
ICCD/High Performance Computing
meier3 at llnl.gov
More information about the general
mailing list