[openib-general] question on opensm error

Ronald G. Minnich rminnich at lanl.gov
Wed Feb 16 08:45:17 PST 2005



On Tue, 16 Feb 2005, Hal Rosenstock wrote:

> On Tue, 2005-02-15 at 22:22, Ronald G. Minnich wrote:
> > On Tue, 15 Feb 2005, Hal Rosenstock wrote:
> > 
> > > I presume your subnet has 179 HCAs ? Do you know ?
> > 
> > no errors. It's just that opensm won't run. 
> 
> Won't run or won't do anything on the subnet ?
> 
> Not sure what you mean by won't run ?

ok, just found it. 

There is a sys fail red light on the CPU on the 96-port switch that the
opensm host attaches to.

What's weird is none of the ib admin tools found anything. ibnetdiscover 
happily walked the whole subnet. The only problem was that opensm would 
not run, but the errors were unclear. So many things appeared to be 
working that it did not occur to me to walk over and look at the switch. 
Stupid of me. 

Now that I've turned that switch off I get this:
[1108572233:000155763][40BFF970] -> __osm_state_mgr_sm_port_down_msg: 


******************************************************************
************************** SM PORT DOWN **************************
******************************************************************


[1108572233:000155778][40BFF970] -> __osm_sm_state_mgr_signal_error: ERR 
3207: Invalid signal OSM_SM_SIGNAL_DISCOVER in state 
IB_SMINFO_STATE_DISCOVERING.

which I assume is its way of telling me that the switch port is down. 

ron



More information about the general mailing list