[openib-general] [ANNOUCEv2] OpenIB OpenSM 1.1.0: trunk now supports 1.8.0 features

Brett Bode brett at scl.ameslab.gov
Wed Sep 14 12:13:32 PDT 2005


Hal,
     I have found out a bit more information. I think you are correct 
that the switch was getting messed up. I had tried resetting the switch 
with the old opensm code we had been running and found that fixed 
things up until the bad node was plugged in. We had not reset the 
switch since upgrading the opensm code. Upon doing that all seems to 
work again. Opensm throws some error below due to the bad node, but it 
appears to continue to correctly configure the remaining network. So I 
am currently thinking the latest opensm more or less correctly deals 
with the failed node. I also suspect the older opensm not only handled 
the error badly but somehow caused the switch to get into a confused 
state that the new opensm couldn't fix without a reset.

Here is the repeated errors thrown:
-------------- next part --------------
A non-text attachment was scrubbed...
Name: osm-error.log
Type: application/octet-stream
Size: 24279 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20050914/eb201f6d/attachment.obj>
-------------- next part --------------

Here is the output of the other commands you suggested with everything 
working:
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ib-works
Type: application/octet-stream
Size: 7629 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20050914/eb201f6d/attachment-0001.obj>
-------------- next part --------------


Brett
On Sep 14, 2005, at 10:14 AM, Hal Rosenstock wrote:
>
> What OpenIB svn version are you running ?
>
>> What is the procedure for determining if the multicast setup on the
>> switch is trashed?
>
> When the failure occurs:
>
> Please run ibnetdiscover and send the output.
> Also run ibchecknet to see what this shows
>
> ibroute - display unicast and multicast forwarding tables of switches
>
> So determine the LIDs of the switches (ibswitches can help with this)
>
> So it's something like:
> ibnetdiscover top1
> ibswitches top1
> Switch  : 0x005442ba00003080 ports 24 "MT47396 Infiniscale-III 
> Mellanox Technologies" port 0 lid 2
> Switch  : 0x0008f10400410015 ports 8 "SW-6IB4 Voltaire" port 0 lid 5
>
> ibroute -M 2
> Multicast mlids [0xc000-0xc3ff] of switch Lid 0x2 guid
> 0x005442ba00003080 (MT47396 Infiniscale-III Mellanox Technologies):
>             0                   1                   2
>      Ports: 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4
>  MLid
> 0xc000                              x
> 0xc001                              x
> 0xc002                              x
> 0xc003                              x
> 4 valid mlids dumped
>
> ibroute -M 5
> Multicast mlids [0xc000-0xc3ff] of switch Lid 0x5 guid
> 0x0008f10400410015 (SW-6IB4 Voltaire):
>      Ports: 0 1 2 3 4 5 6 7 8
>  MLid
> 0xc000        x   x     x
> 0xc001        x   x     x
> 0xc003        x   x     x
> 0xc004            x     x
> 0xc005            x
> 0xc006                  x
> 6 valid mlids dumped
>
> The LIDs to use are configuration dependent and depend on what the 
> OpenSM hands out.
>
> There is also ibtracert
> ibtracert - display unicast or multicast route from source to 
> destination
>
>> I suspect that if it is, the crashed node is causing it as I had power
>> cycled the switch yesterday which seemed to get things working up 
>> until
>> I plugged the crashed node in again.
>
> But without recycling the switch things don't work, right ? With just
> unplugging this node, it doesn't work ? It sounds like the switch has
> some issue. Can you tell if it forwards any packets ?
>
> -- Hal
>
>



More information about the general mailing list