[Users] Event handling/notification in opensm

Lloyd Brown lloyd_brown at byu.edu
Wed Aug 15 12:34:59 PDT 2012


Hi, Ira.

Actually, it's much more likely that I misheard, than that you misspoke.
 My understanding of the specs is fairly limited; I've slogged my way
through a few small sections, and that's about it.

What I'm really trying to do is to capture and report on instances where
there is likely an upcoming hardware failure.  For example, I've been
told in the past that SymbolError counter increasing more than just a
little (for some definition of "a little"), is probably indicative of a
failing cable.

Right now I have something I hacked together that wraps around
ibqueryerrors, and I run it on a cron.  Mostly I'm just trying to see if
there's a better, more asynchronous way to get notified of these type of
events.

Lloyd Brown
Systems Administrator
Fulton Supercomputing Lab
Brigham Young University
http://marylou.byu.edu

On 08/15/2012 01:27 PM, Ira Weiny wrote:
> On Wed, 15 Aug 2012 12:41:44 -0600
> Lloyd Brown <lloyd_brown at byu.edu> wrote:
> 
>> Since nobody seems to be starting any conversations on our newly-minted
>> OFA users list, I guess I'll try.
>>
>> Is there any documentation somewhere that describes how to integrate
>> trap-style events in opensm, into some external system?  For example, at
>> the OFA User Day last week, Ira mentioned the new performance manager
>> code in opensm, that would clear error counters when they reached 75% of
>> the maximum, and would then send a trap about the event.
> 
> Sorry if I misspoke, the perfmgr does not send a trap about the event.  According to my interpretation of the spec the PM does not support InformInfo.  What it will do is log "out of band" clears which it detects as well as all non-zero error counters to the opensm.log.
> 
>>
>> So far, I can see some trap related events in the opensm.log, but I have
>> no idea how to do anything with them.  For example, I might want to
>> execute a script, or send an SNMP trap to something else, etc.  Is there
>> any way to integrate this, short of periodically parsing the logfile?
>> Any equivalent to snmptrapd, to execute specific actions when specific
>> traps are received?
> 
> Are you speaking of traps as defined in the spec?  The proper way to do this is to send an InformInfo "subscribe" to the SM(SA) or other class manager.  See 13.4.11 of the spec.
> 
> Unfortunately, right now I don't know of any software which allows for generic subscribing to the SA for traps/notices.  Nor do I know of any manager other than the SM which supports it.[*]
> 
> The Traps you see in OpenSM are generated by the hardware/software for various things which really help the SM effectively manage the fabric.  For example port state change traps by switches.  Other things which are less critical but still very important like node description changes have been added as time has progressed.
> 
> To play with this a bit you could check out ibsendtrap which is a test utility in infinband-diags.  (use: ./configure --enable-test-utils)[$]  But this only sends a few traps which it was coded to send and is not considered "ready for prime time" to be included in the default build.
> 
> Finally, this is a part of the spec is pretty confusing to me so I encourage others to help me out if I have said something wrong.
> 
> Sorry,
> Ira
> 
> [*] and frankly I am not sure of the level of support by OpenSM either.
> [$] git://beany.openfabrics.org/~iraweiny/infiniband-diags.git
> 
>>
>> Thanks,
>> -- 
>> Lloyd Brown
>> Systems Administrator
>> Fulton Supercomputing Lab
>> Brigham Young University
>> http://marylou.byu.edu
>> _______________________________________________
>> Users mailing list
>> Users at lists.openfabrics.org
>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/users
> 
> 



More information about the Users mailing list