[openib-general] Re: Some OpenSM 1.8.0 Anomalies

Hal Rosenstock halr at voltaire.com
Thu Sep 8 05:21:00 PDT 2005


On Thu, 2005-09-08 at 02:13, Eitan Zahavi wrote:
> Hal Rosenstock wrote:
> > On Wed, 2005-09-07 at 13:00, Eitan Zahavi wrote:
> > 
> >>>Also, I see the following messages although these should not be the case:
> >>>Sep 06 15:41:48 725691 [B76A4C40] -> __match_notice_to_inf_rec: ERR 0207: Cannot find destination port with LID:0x0007
> >>
> >>This means that the LID of the port registered as the source for this inform info is not recognized as a valid LID.
> >>
> >>>...
> >>>Sep 06 15:41:48 726186 [B76A4C40] -> __match_notice_to_inf_rec: ERR 0207: Cannot find source port with GUID:0x0008f10403960559
> >>
> >>The meaning of this is that the incoming trap source is not a recognized (included in the SM database) guid
> > 
> > 
> > Both this LID and GUID were discovered so I don't understand how this is
> > the case. I can send the log separately if interested (it is over >1M
> > gzipped).
> I guess this trap report and inform info flow are new.
> Which ULP is using it?

Solaris 10.

> Can you describe the flow?

It looks like it occurs on SM port down which seems OK. Here's an
extract of that portion of the log:

Sep 06 15:41:48 724961 [B76A4C40] -> __osm_state_mgr_is_sm_port_down: ]
Sep 06 15:41:48 724980 [0000] -> SM port is down.
Sep 06 15:41:48 724980 [B76A4C40] -> SM port is down.Sep 06 15:41:48 725261 [B76A4C40] -> __osm_state_mgr_sm_port_down_msg:


******************************************************************
************************** SM PORT DOWN **************************
******************************************************************


Sep 06 15:41:48 725283 [B76A4C40] -> osm_drop_mgr_process: [
Sep 06 15:41:48 725303 [B76A4C40] -> osm_drop_mgr_process: Checking node 0x0008f1040396040c.
Sep 06 15:41:48 725324 [B76A4C40] -> __osm_drop_mgr_process_node: [
Sep 06 15:41:48 725342 [B76A4C40] -> __osm_drop_mgr_process_node: Unreachable node 0x0008f1040396040c.
Sep 06 15:41:48 725364 [B76A4C40] -> __osm_drop_mgr_remove_port: [
Sep 06 15:41:48 725383 [B76A4C40] -> __osm_drop_mgr_remove_port: Unreachable port 0x0008f1040396040e.
Sep 06 15:41:48 725417 [B76A4C40] -> __osm_drop_mgr_remove_port: Clearing abandoned LID range [0x7,0x7].
Sep 06 15:41:48 725480 [B76A4C40] -> __osm_drop_mgr_remove_port: Unlinking local node 0x0008f1040396040c, port 0x2
                                and remote node 0x0008f10403960558, port 0x1.
Sep 06 15:41:48 725504 [B76A4C40] -> __osm_drop_mgr_remove_port: resetting discovery count of node: 0x0008f10403960558 port num:1.
Sep 06 15:41:48 725525 [B76A4C40] -> __osm_drop_mgr_remove_port: Clearing physical port number 2.
Sep 06 15:41:48 725563 [B76A4C40] -> osm_report_notice: [
Sep 06 15:41:48 725583 [B76A4C40] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0003 GID:0xfe80000000000000,0x0008f10403960559
Sep 06 15:41:48 725612 [B76A4C40] -> __match_notice_to_inf_rec: [
Sep 06 15:41:48 725632 [B76A4C40] -> __match_notice_to_inf_rec: Mismatch by Node Type: II=0x000003 Trap=0x000004
Sep 06 15:41:48 725653 [B76A4C40] -> __match_notice_to_inf_rec: ]
Sep 06 15:41:48 725671 [B76A4C40] -> __match_notice_to_inf_rec: [
Sep 06 15:41:48 725691 [B76A4C40] -> __match_notice_to_inf_rec: ERR 0207: Cannot find destination port with LID:0x0007
Sep 06 15:41:48 725710 [B76A4C40] -> __match_notice_to_inf_rec: ]
Sep 06 15:41:48 725728 [B76A4C40] -> __match_notice_to_inf_rec: [
Sep 06 15:41:48 725747 [B76A4C40] -> __match_notice_to_inf_rec: Mismatch by Node Type: II=0x000001 Trap=0x000004
Sep 06 15:41:48 725767 [B76A4C40] -> __match_notice_to_inf_rec: ]
Sep 06 15:41:48 725785 [B76A4C40] -> __match_notice_to_inf_rec: [
Sep 06 15:41:48 725804 [B76A4C40] -> __match_notice_to_inf_rec: Mismatch by Node Type: II=0x000002 Trap=0x000004
Sep 06 15:41:48 725823 [B76A4C40] -> __match_notice_to_inf_rec: ]
Sep 06 15:41:48 725843 [B76A4C40] -> osm_report_notice: ]
Sep 06 15:41:48 725862 [B76A4C40] -> Removed port with GUID:0x0008f1040396040e LID range [0x7,0x7] of node:Voltaire HCA400
Sep 06 15:41:48 725883 [B76A4C40] -> __osm_drop_mgr_remove_port: ]
Sep 06 15:41:48 725904 [B76A4C40] -> __osm_drop_mgr_process_node: ]
Sep 06 15:41:48 725923 [B76A4C40] -> osm_drop_mgr_process: Checking node 0x0008f10403960558.
Sep 06 15:41:48 725943 [B76A4C40] -> osm_drop_mgr_process: Checking full discovery of node 0x0008f10403960558.
Sep 06 15:41:48 725964 [B76A4C40] -> osm_drop_mgr_process: Checking port 0x0008f10403960559.
Sep 06 15:41:48 725984 [B76A4C40] -> __osm_drop_mgr_remove_port: [
Sep 06 15:41:48 726002 [B76A4C40] -> __osm_drop_mgr_remove_port: Unreachable port 0x0008f10403960559.
Sep 06 15:41:48 726023 [B76A4C40] -> __osm_drop_mgr_remove_port: Clearing abandoned LID range [0x3,0x3].
Sep 06 15:41:48 726043 [B76A4C40] -> __osm_drop_mgr_remove_port: Clearing physical port number 1.
Sep 06 15:41:48 726067 [B76A4C40] -> osm_report_notice: [
Sep 06 15:41:48 726086 [B76A4C40] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0003 GID:0xfe80000000000000,0x0008f10403960559
Sep 06 15:41:48 726110 [B76A4C40] -> __match_notice_to_inf_rec: [
Sep 06 15:41:48 726129 [B76A4C40] -> __match_notice_to_inf_rec: Mismatch by Node Type: II=0x000003 Trap=0x000004
Sep 06 15:41:48 726149 [B76A4C40] -> __match_notice_to_inf_rec: ]
Sep 06 15:41:48 726167 [B76A4C40] -> __match_notice_to_inf_rec: [
Sep 06 15:41:48 726186 [B76A4C40] -> __match_notice_to_inf_rec: ERR 0207: Cannot find source port with GUID:0x0008f10403960559
Sep 06 15:41:48 726206 [B76A4C40] -> __match_notice_to_inf_rec: ]
Sep 06 15:41:48 726225 [B76A4C40] -> __match_notice_to_inf_rec: [
Sep 06 15:41:48 726243 [B76A4C40] -> __match_notice_to_inf_rec: Mismatch by Node Type: II=0x000001 Trap=0x000004
Sep 06 15:41:48 726263 [B76A4C40] -> __match_notice_to_inf_rec: ]
Sep 06 15:41:48 726281 [B76A4C40] -> __match_notice_to_inf_rec: [
Sep 06 15:41:48 726300 [B76A4C40] -> __match_notice_to_inf_rec: Mismatch by Node Type: II=0x000002 Trap=0x000004
Sep 06 15:41:48 726319 [B76A4C40] -> __match_notice_to_inf_rec: ]
Sep 06 15:41:48 726339 [B76A4C40] -> osm_report_notice: ]
Sep 06 15:41:48 726357 [B76A4C40] -> Removed port with GUID:0x0008f10403960559 LID range [0x3,0x3] of node:MT23108 InfiniHost Mellanox Technologies
Sep 06 15:41:48 726378 [B76A4C40] -> __osm_drop_mgr_remove_port: ]
Sep 06 15:41:48 726426 [B76A4C40] -> osm_drop_mgr_process: ]

> Then if you can send us the log file it will help.

I'll send you the whole log offline if you still want it.

-- Hal

> We should be able to at least track the inform info registration and
> the trap received from the log (actually this is most what we need).
> It is very possible this is a new untested flow.
> We did not validate all the possible filtering of
> Inform Info based reports. So there might be a simple bug in there
> (NTOH maybe).
> 
> > 
> > -- Hal
> 




More information about the general mailing list