[openib-general] OpenSM not coming out of standby state..

Troy Benjegerdes hozer at hozed.org
Wed Nov 30 17:54:45 PST 2005


On Wed, Nov 30, 2005 at 07:33:44PM -0600, Troy Benjegerdes wrote:
> A couple of days ago I started up two instances of opensm on my network,
> and set one with priority 11, the other with the default 10.
> 
> I could kill one and the other would become master a few minutes later.
> 
> Well, today, I found that there are no active links anywhere in the
> network.. But both SM's still appeared to be running.
> 
> then I killed them both, and restarted one with 'opensm -V -p 11', 
> 
> it is still staying in STANDBY state, and produced the 4MB log available at
> 
> http://scl.ameslab.gov/~troy/osm.log-nomaster
> 
> (Hal, if you want access to this system, let me know)

And the rest of the story..

This happened after I cross-connected two networks, and had opensm
running on two nodes that had back-to-back connections (with no switch).
I didn't think anything of it at the time since the 'active' lights were
off on the cards machines that were connected (they had physical link,
but no logical link).

I've since killed opensm on the 'new' nodes, but there is still some
state somewhere that prevents opensm from 'nicely' becoming the master..

If I run with 'opensm -d 0 -p 11', it becomes master just fine. How does
one go about tracking down a broken rogue SM that isn't bringing up the
network?



More information about the general mailing list