[openib-general] Re: Some More Operational Issues with OpenSM 1.1.0

Hal Rosenstock halr at voltaire.com
Tue Sep 13 08:30:46 PDT 2005


Hi Eitan,

On Tue, 2005-09-13 at 10:30, Eitan Zahavi wrote:
> Hal Rosenstock wrote:
> > 2. Not sure what the LID manager doesn't like about the old settings
> > (from OpenSM 1.1.0).
> > 
> > Sep 13 09:34:59 330140 [B7F144A0] -> __osm_lid_mgr_validate_db: [
> > Sep 13 09:34:59 330260 [B7F144A0] -> __osm_lid_mgr_validate_db: ERR
> > 0312: Ilegal LID range [0x4:0x0] for guid:0x0008f10403961355.
> > Sep 13 09:34:59 330289 [B7F144A0] -> osm_db_delete: [
> > Sep 13 09:34:59 330313 [B7F144A0] -> osm_db_delete: ]
> > Sep 13 09:34:59 330337 [B7F144A0] -> __osm_lid_mgr_validate_db: ERR
> > 0312: Ilegal LID range [0x3:0x0] for guid:0x0008f10403960559.
> > Sep 13 09:34:59 330360 [B7F144A0] -> osm_db_delete: [
> > Sep 13 09:34:59 330379 [B7F144A0] -> osm_db_delete: ]
> > Sep 13 09:34:59 330402 [B7F144A0] -> __osm_lid_mgr_validate_db: ERR
> > 0312: Ilegal LID range [0x5:0x0] for guid:0x005442ba00003080.
> > Sep 13 09:34:59 330424 [B7F144A0] -> osm_db_delete: [
> > Sep 13 09:34:59 330443 [B7F144A0] -> osm_db_delete: ]
> > Sep 13 09:34:59 330466 [B7F144A0] -> __osm_lid_mgr_validate_db: ERR
> > 0312: Ilegal LID range [0x7:0x0] for guid:0x0008f1040396055a.
> > Sep 13 09:34:59 330535 [B7F144A0] -> osm_db_delete: [
> > Sep 13 09:34:59 330556 [B7F144A0] -> osm_db_delete: ]
> The cache file should have the format:
> 0x0008f1040396055a 0x7 0x7
> 
> I wonder if this is what it looks like. From the complaint it looks like
> the line is:
> 0x0008f1040396055a 0x7 0x0
> or
> 0x0008f1040396055a 0x7
> 
> Can you tell us what is it really?

/var/cache/osm/guid2lid 
0x0008f10403960985 0x0007 0x0007

0x0008f10400410015 0x0003 0x0003

0x005442ba00003080 0x0005 0x0005

0x0008f1040396055a 0x0006 0x0006

0x005442b100004901 0x0002 0x0002

0x0008f10403961355 0x0004 0x0004

0x0008f10403960559 0x0001 0x0001

> Also there might be a bug in parsing that file too. But it is a new bug
> caused by the merges... I tested this feature on 1.8.0 very thoroughly.
> 
> 
> > 
> > 
> > 3. LinearFDBTop is being detected as corrupted. This is bad.
> > Sep 13 09:34:59 732496 [B7713C40] -> osm_si_rcv_process: [
> > Sep 13 09:34:59 732514 [B7713C40] -> osm_si_rcv_process: Switch GUID =
> > 0x0008f10400410015, TID = 0x1273.
> > Sep 13 09:34:59 732535 [B7713C40] -> osm_si_rcv_process: ERR 3610:
> >                                 Bad LinearFDBTop value = 0xC000 on
> > switch 0x8f10400410015.
> >                                 Forcing correction to 0x0.
> This is an old message that is caused by the way the Anafa firmware reports
> the LinearFDBTop after reboot. The SM forces the value 0x0 and this clears
> the issue until the next boot of the switch. We should make this into a warning.

Is this still an outstanding Anafa firmware bug ?

> > 4. SM Set PortInfo being rejected with status 7. Not sure why that would
> > be. Also, in this case (and probably others which are similar), OpenSM
> > continues as if things succeeded. Is that right ?
> Yes it continues but should report "Errors in Intialization" and retry.

I don't see this. It might depend on when it occurs.

> We should be able to reproduce it here. and will.

Good. Thanks.

> The key is to understand what in the PortInfo caused the "illegal value" error.

OK.

-- Hal




More information about the general mailing list