[openib-general] [ANNOUCEv2] OpenIB OpenSM 1.1.0: trunk now supports 1.8.0 features
Brett Bode
brett at scl.ameslab.gov
Wed Sep 14 15:03:05 PDT 2005
On Sep 14, 2005, at 4:36 PM, Hal Rosenstock wrote:
> Hi Brett,
>
> On Wed, 2005-09-14 at 15:13, Brett Bode wrote:
>> I have found out a bit more information. I think you are correct
>> that the switch was getting messed up. I had tried resetting the
>> switch
>> with the old opensm code we had been running and found that fixed
>> things up until the bad node was plugged in. We had not reset the
>> switch since upgrading the opensm code. Upon doing that all seems to
>> work again. Opensm throws some error below due to the bad node, but it
>> appears to continue to correctly configure the remaining network.
>
> and the switch continues to work ? (That's with the new (1.1.0) OpenSM,
> right ?
Yes
>
>> So I
>> am currently thinking the latest opensm more or less correctly deals
>> with the failed node. I also suspect the older opensm not only handled
>> the error badly but somehow caused the switch to get into a confused
>> state that the new opensm couldn't fix without a reset.
>>
>> Here is the repeated errors thrown:
>
> Right, that looks similar to yesterday's log except that the DR is a
> little different. Did the misbehaving HCA node get plugged into a
> different switch port perhaps ?
That is possible.
>>
>> ______________________________________________________________________
>> Here is the output of the other commands you suggested with everything
>> working:
>
> I'm not sure which HCA port the SM ran on but...
>
> The multicast tree appears only set up on the one switch. Were the
> other
> nodes off the other switch not involved ?
>
> Also, port 8 off the switch appears not in the multicast tree although
> I
> see it in the topology file. Not sure why that would be.
>
I think we only have the IPOIB modules loaded on the systems on the one
switch. The system connected to port 8 also does not have the IP module
loaded. Originally we did not have the two switches linked together,
but it we had a system on the second switch that had more up to date
software so we loaded the new opensm onto it and connected the switches
together. We are just getting the stuff on the second switch installed
and are still waiting on some parts as well...
Thanks,
Brett
More information about the general
mailing list