[openib-general] Re: Some More Operational Issues with OpenSM 1.1.0

Eitan Zahavi eitan at mellanox.co.il
Tue Sep 13 07:30:10 PDT 2005


Hal Rosenstock wrote:
> Hi,
> 
> Here are some additional operational issues with OpenSM 1.1.0:
> 
> 1. The following warning now appears when OpenSM is started up:
> opensm: /usr/local/lib/libopensm.so.1: no version information available
> (required by opensm)
> 
> 2. Not sure what the LID manager doesn't like about the old settings
> (from OpenSM 1.1.0).
> 
> Sep 13 09:34:59 330140 [B7F144A0] -> __osm_lid_mgr_validate_db: [
> Sep 13 09:34:59 330260 [B7F144A0] -> __osm_lid_mgr_validate_db: ERR
> 0312: Ilegal LID range [0x4:0x0] for guid:0x0008f10403961355.
> Sep 13 09:34:59 330289 [B7F144A0] -> osm_db_delete: [
> Sep 13 09:34:59 330313 [B7F144A0] -> osm_db_delete: ]
> Sep 13 09:34:59 330337 [B7F144A0] -> __osm_lid_mgr_validate_db: ERR
> 0312: Ilegal LID range [0x3:0x0] for guid:0x0008f10403960559.
> Sep 13 09:34:59 330360 [B7F144A0] -> osm_db_delete: [
> Sep 13 09:34:59 330379 [B7F144A0] -> osm_db_delete: ]
> Sep 13 09:34:59 330402 [B7F144A0] -> __osm_lid_mgr_validate_db: ERR
> 0312: Ilegal LID range [0x5:0x0] for guid:0x005442ba00003080.
> Sep 13 09:34:59 330424 [B7F144A0] -> osm_db_delete: [
> Sep 13 09:34:59 330443 [B7F144A0] -> osm_db_delete: ]
> Sep 13 09:34:59 330466 [B7F144A0] -> __osm_lid_mgr_validate_db: ERR
> 0312: Ilegal LID range [0x7:0x0] for guid:0x0008f1040396055a.
> Sep 13 09:34:59 330535 [B7F144A0] -> osm_db_delete: [
> Sep 13 09:34:59 330556 [B7F144A0] -> osm_db_delete: ]
The cache file should have the format:
0x0008f1040396055a 0x7 0x7

I wonder if this is what it looks like. From the complaint it looks like
the line is:
0x0008f1040396055a 0x7 0x0
or
0x0008f1040396055a 0x7

Can you tell us what is it really?

Also there might be a bug in parsing that file too. But it is a new bug
caused by the merges... I tested this feature on 1.8.0 very thoroughly.


> 
> 
> 3. LinearFDBTop is being detected as corrupted. This is bad.
> Sep 13 09:34:59 732496 [B7713C40] -> osm_si_rcv_process: [
> Sep 13 09:34:59 732514 [B7713C40] -> osm_si_rcv_process: Switch GUID =
> 0x0008f10400410015, TID = 0x1273.
> Sep 13 09:34:59 732535 [B7713C40] -> osm_si_rcv_process: ERR 3610:
>                                 Bad LinearFDBTop value = 0xC000 on
> switch 0x8f10400410015.
>                                 Forcing correction to 0x0.
This is an old message that is caused by the way the Anafa firmware reports
the LinearFDBTop after reboot. The SM forces the value 0x0 and this clears
the issue until the next boot of the switch. We should make this into a warning.

> 
> 4. SM Set PortInfo being rejected with status 7. Not sure why that would
> be. Also, in this case (and probably others which are similar), OpenSM
> continues as if things succeeded. Is that right ?
Yes it continues but should report "Errors in Intialization" and retry.
We should be able to reproduce it here. and will.
The key is to understand what in the PortInfo caused the "illegal value" error.



More information about the general mailing list