[ofa-general] Re: [PATCH] Opensm: main exit codes

Timothy A. Meier meier3 at llnl.gov
Mon Nov 24 14:03:38 PST 2008


Hi Sasha,

I guess I viewed this patch as just cleaning up the interface between the program and the system.

Sasha Khapyorsky wrote:
> On 10:34 Mon 24 Nov     , Timothy A. Meier wrote:
>> Hi Sasha,
>>
>> Sasha Khapyorsky wrote:
>>> Hi Tim,
>>>
>>> On 17:10 Tue 18 Nov     , Timothy A. Meier wrote:
>>>>   I thought it would be useful to define a set of exit codes for opensm.  A quick examination of main.c
>>>> showed a few different ways to terminate.  How about this patch?  Obviously this doesn't catch every
>>>> possible exit scenario, but its a start that can be built upon.
>>> Personally I read 'exit(0)' faster than 'exit(OSM_EXIT_TYPE_NORMAL)',
>>> but maybe it is just me :).
>> Me too :^)  Not much confusion over a return code of 0.
>>
>> The audience for this change wouldn't be the people writing the software,
> 
> Somehow we need to care about yourselves too :)
> 
>> but admins, scripts, and tools that
>> start/stop/monitor opensm.  At least that is our use case.
>>
>>> Maybe error codes could be formalized, but I'm not sure that it would be
>>> beneficial without any practical uses (and clear requirements
>>> understanding). Finally we can found us in a middle of the total mess
>>> similar to how OSM_LOG_* is used today.
>>>
>>> Sasha
>>>
>> So the uses/requirements would be to formalize how opensm handles the non-ideal termination condition,
>> for the purpose of providing quick, convenient, and consistent information for other system level tools
>> that are responsible for starting/stopping/monitoring/reporting opensm.
> 
> And are there any of such tools? Or any *real* use?
>

Chicken/Egg?  Currently, we depend on only ZERO or non-zero.  Although OpenSM returns "other" values
on exit, they aren't really formalized or documented.  Hence the patch. ;^)

Personally, I have (and create) several different versions of opensm with small customizations,
and test them on our cluster testbeds.  I often will start/stop them in a variety of configurations
(with and without plugins, more than one sm on a node, etc.) and if and when opensm doesn't
startup normally, it would be nice to have a meaningful exit code.

Perhaps others might find it useful as well, or for some future use.

But again, I originally considered this more as code cleanup.  Converting the exits, returns, and aborts
to provide a more consistent interface to the system.

-- 
Timothy A. Meier
Computer Scientist
ICCD/High Performance Computing
meier3 at llnl.gov



More information about the general mailing list