[openib-general] OpenSM (again)

Hal Rosenstock halr at voltaire.com
Tue Apr 12 10:00:17 PDT 2005


On Tue, 2005-04-12 at 12:46, Roland Fehrenbacher wrote: 
>     Hal> SM election occurs per high priority low GUID. So if you
>     Hal> don't care which SM is the master than you don't need to do
>     Hal> anything. If you want a specific order (and it is not in GUID
>     Hal> order) then you need to specify priority.
> 
> Ok. I tried this, specifying priority 0 on one server, and priority 15
> on another one. I assume priority 15, will be the master.
> If I first start the priority 0 opensm, and then the priority 15 one,
> things look normal: Log excerpts
> 
> priority 0 server
> 
> Apr 12 18:41:06 [4000] -> OpenSM Rev:openib-1.0.0
> Apr 12 18:41:06 [4000] -> osm_opensm_init: Forcing single threaded dispatcher.
> Apr 12 18:41:06 [4000] -> osm_report_notice: Reporting Generic Notice type:3 num:66 from LID:0x0000 GID:0xfe80000000000000,0x0000000000000000
> Apr 12 18:41:06 [4000] -> osm_report_notice: Reporting Generic Notice type:3 num:66 from LID:0x0000 GID:0xfe80000000000000,0x0000000000000000
> Apr 12 18:41:06 [4000] -> osm_vendor_bind: Binding to port 0x2c902004013c2.
> Apr 12 18:41:06 [4000] -> osm_vendor_bind: Binding to port 0x2c902004013c2.
> Apr 12 18:41:06 [18007] -> osm_mcmr_rcv_leave_mgrp: ERR 1B25:Received an Invalid Delete Request.
> Apr 12 18:41:06 [18007] -> osm_mcmr_rcv_leave_mgrp: ERR 1B25:Received an Invalid Delete Request.
> Apr 12 18:41:06 [18007] -> osm_mcmr_rcv_leave_mgrp: ERR 1B25:Received an Invalid Delete Request.
> Apr 12 18:41:06 [18007] -> osm_mcmr_rcv_leave_mgrp: ERR 1B25:Received an Invalid Delete Request.
> Apr 12 18:41:06 [18007] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x04 num:144 Producer:1 from LID:0x0001 TID:0x0000000000000011
> Apr 12 18:41:06 [18007] -> osm_report_notice: Reporting Generic Notice type:4 num:144 from LID:0x0001 GID:0xfe80000000000000,0x0002c902004013c2
> Apr 12 18:41:06 [18007] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x04 num:144 Producer:1 from LID:0x0002 TID:0x000000000000000d
> Apr 12 18:41:06 [18007] -> osm_report_notice: Reporting Generic Notice type:4 num:144 from LID:0x0002 GID:0xfe80000000000000,0x0002c9020040133a
> Apr 12 18:42:25 [18007] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x04 num:144 Producer:1 from LID:0x0002 TID:0x000000000000000e
> Apr 12 18:42:25 [18007] -> osm_report_notice: Reporting Generic Notice type:4 num:144 from LID:0x0002 GID:0xfe80000000000000,0x0002c9020040133a
> 
> priority 15 server
> 
> Apr 12 18:42:25 [4000] -> OpenSM Rev:openib-1.0.0
> Apr 12 18:42:25 [4000] -> osm_opensm_init: Forcing single threaded dispatcher.
> Apr 12 18:42:25 [4000] -> osm_report_notice: Reporting Generic Notice type:3 num:66 from LID:0x0000 GID:0xfe80000000000000,0x0000000000000000
> Apr 12 18:42:25 [4000] -> osm_report_notice: Reporting Generic Notice type:3 num:66 from LID:0x0000 GID:0xfe80000000000000,0x0000000000000000
> Apr 12 18:42:25 [4000] -> osm_vendor_bind: Binding to port 0x2c9020040133a.
> Apr 12 18:42:25 [4000] -> osm_vendor_bind: Binding to port 0x2c9020040133a.
> Apr 12 18:42:25 [18007] -> osm_mcmr_rcv_leave_mgrp: ERR 1B25:Received an Invalid Delete Request.
> Apr 12 18:42:25 [18007] -> osm_mcmr_rcv_leave_mgrp: ERR 1B25:Received an Invalid Delete Request.
> Apr 12 18:42:25 [18007] -> osm_mcmr_rcv_leave_mgrp: ERR 1B25:Received an Invalid Delete Request.
> Apr 12 18:42:25 [18007] -> osm_mcmr_rcv_leave_mgrp: ERR 1B25:Received an Invalid Delete Request.
> 
> When I kill the priority 15 server however, the priority 0 server runs
> amok with continous log messages like:
> 
> Apr 12 18:44:28 [2400A] -> umad_receiver: send completed with error(method=1 attr=20) -- dropping.
> Apr 12 18:44:28 [2400A] -> umad_receiver: send completed with error(method=1 attr=20) -- dropping.

Attribute 0x20 is SMInfo. This is just the SubnGet(SMInfo) from the
priority 0 server failing (no matching SubnGetResp received) which is
"normal" if you killed the priority 15 server.

Do the messages ever subside ?

> I assume that the handover to the priority 0 opensm hasn't worked
> then.

This isn't really handover but that is another matter.
You should be able to use the sminfo diag to see whether this SM has
assumed the MASTER role.

-- Hal





More information about the general mailing list