[openib-general] question on opensm error
Ronald G. Minnich
rminnich at lanl.gov
Wed Feb 16 08:45:17 PST 2005
On Tue, 16 Feb 2005, Hal Rosenstock wrote:
> On Tue, 2005-02-15 at 22:22, Ronald G. Minnich wrote:
> > On Tue, 15 Feb 2005, Hal Rosenstock wrote:
> >
> > > I presume your subnet has 179 HCAs ? Do you know ?
> >
> > no errors. It's just that opensm won't run.
>
> Won't run or won't do anything on the subnet ?
>
> Not sure what you mean by won't run ?
ok, just found it.
There is a sys fail red light on the CPU on the 96-port switch that the
opensm host attaches to.
What's weird is none of the ib admin tools found anything. ibnetdiscover
happily walked the whole subnet. The only problem was that opensm would
not run, but the errors were unclear. So many things appeared to be
working that it did not occur to me to walk over and look at the switch.
Stupid of me.
Now that I've turned that switch off I get this:
[1108572233:000155763][40BFF970] -> __osm_state_mgr_sm_port_down_msg:
******************************************************************
************************** SM PORT DOWN **************************
******************************************************************
[1108572233:000155778][40BFF970] -> __osm_sm_state_mgr_signal_error: ERR
3207: Invalid signal OSM_SM_SIGNAL_DISCOVER in state
IB_SMINFO_STATE_DISCOVERING.
which I assume is its way of telling me that the switch port is down.
ron
More information about the general
mailing list