[openib-general] [ANNOUCEv2] OpenIB OpenSM 1.1.0: trunk now supports 1.8.0 features

Hal Rosenstock halr at voltaire.com
Wed Sep 14 14:36:12 PDT 2005


Hi Brett,

On Wed, 2005-09-14 at 15:13, Brett Bode wrote:
>      I have found out a bit more information. I think you are correct 
> that the switch was getting messed up. I had tried resetting the switch 
> with the old opensm code we had been running and found that fixed 
> things up until the bad node was plugged in. We had not reset the 
> switch since upgrading the opensm code. Upon doing that all seems to 
> work again. Opensm throws some error below due to the bad node, but it 
> appears to continue to correctly configure the remaining network.

and the switch continues to work ? (That's with the new (1.1.0) OpenSM,
right ?

>  So I 
> am currently thinking the latest opensm more or less correctly deals 
> with the failed node. I also suspect the older opensm not only handled 
> the error badly but somehow caused the switch to get into a confused 
> state that the new opensm couldn't fix without a reset.
> 
> Here is the repeated errors thrown:

Right, that looks similar to yesterday's log except that the DR is a
little different. Did the misbehaving HCA node get plugged into a
different switch port perhaps ?
> 
> ______________________________________________________________________
> Here is the output of the other commands you suggested with everything 
> working:

I'm not sure which HCA port the SM ran on but...

The multicast tree appears only set up on the one switch. Were the other
nodes off the other switch not involved ?

Also, port 8 off the switch appears not in the multicast tree although I
see it in the topology file. Not sure why that would be.

-- Hal




More information about the general mailing list