[openib-general] OpenSM (again)

Eitan Zahavi eitan at mellanox.co.il
Tue Apr 12 22:50:25 PDT 2005


FYI: OpenSM implements master handover in a "lazy" or "less intrusive"
manner:

OpenSM will only handoff a subnet to the new master on a heavy sweep
sequence. 
So if you start an SM and then start one with higher priority - the handoff
will not happen unless there was some change in the subnet (trap or switch
"change bit").

The main reason for this behavior is the concept of "light sweep" that
minimizes the discovery to checking of "change bits" and now also
"irresponsive ports". So the new SM is not even discovered by the SM. 

The benefit is that as long as there is no change in the subnet the active
SM does not transfer the ownership to the new one - which has an overhead on
the entire subnet
(client re-registration or even LID changes).

This behavior is compliant as the spec says:
C14-60.2.1: If a Master SM finds another Master SM with lower priority (or
same priority and higher GUID) it shall ensure that it is the highest
priority
(or same priority and lower GUID) on the subnet, and if so it shall wait for
the other Master (or Masters) to relinquish control if its portion of the
subnet.
C14-61.2.2: If a Master SM determines that a lower priority Master SM
has not performed a handover within a vendor-specific time period, then
it shall not change the state of the subnet.
 
Eitan Zahavi
Design Technology Director
Mellanox Technologies LTD
Tel:+972-4-9097208
Fax:+972-4-9593245
P.O. Box 586 Yokneam 20692 ISRAEL


> -----Original Message-----
> From: Hal Rosenstock [mailto:halr at voltaire.com]
> Sent: Tuesday, April 12, 2005 8:00 PM
> To: rf at q-leap.de
> Cc: openib-general at openib.org
> Subject: Re: [openib-general] OpenSM (again)
> 
> On Tue, 2005-04-12 at 12:46, Roland Fehrenbacher wrote:
> >     Hal> SM election occurs per high priority low GUID. So if you
> >     Hal> don't care which SM is the master than you don't need to do
> >     Hal> anything. If you want a specific order (and it is not in GUID
> >     Hal> order) then you need to specify priority.
> >
> > Ok. I tried this, specifying priority 0 on one server, and priority 15
> > on another one. I assume priority 15, will be the master.
> > If I first start the priority 0 opensm, and then the priority 15 one,
> > things look normal: Log excerpts
> >
> > priority 0 server
> >
> > Apr 12 18:41:06 [4000] -> OpenSM Rev:openib-1.0.0
> > Apr 12 18:41:06 [4000] -> osm_opensm_init: Forcing single threaded
dispatcher.
> > Apr 12 18:41:06 [4000] -> osm_report_notice: Reporting Generic Notice
type:3
> num:66 from LID:0x0000 GID:0xfe80000000000000,0x0000000000000000
> > Apr 12 18:41:06 [4000] -> osm_report_notice: Reporting Generic Notice
type:3
> num:66 from LID:0x0000 GID:0xfe80000000000000,0x0000000000000000
> > Apr 12 18:41:06 [4000] -> osm_vendor_bind: Binding to port
0x2c902004013c2.
> > Apr 12 18:41:06 [4000] -> osm_vendor_bind: Binding to port
0x2c902004013c2.
> > Apr 12 18:41:06 [18007] -> osm_mcmr_rcv_leave_mgrp: ERR 1B25:Received an
> Invalid Delete Request.
> > Apr 12 18:41:06 [18007] -> osm_mcmr_rcv_leave_mgrp: ERR 1B25:Received an
> Invalid Delete Request.
> > Apr 12 18:41:06 [18007] -> osm_mcmr_rcv_leave_mgrp: ERR 1B25:Received an
> Invalid Delete Request.
> > Apr 12 18:41:06 [18007] -> osm_mcmr_rcv_leave_mgrp: ERR 1B25:Received an
> Invalid Delete Request.
> > Apr 12 18:41:06 [18007] -> __osm_trap_rcv_process_request: Received
Generic
> Notice type:0x04 num:144 Producer:1 from LID:0x0001 TID:0x0000000000000011
> > Apr 12 18:41:06 [18007] -> osm_report_notice: Reporting Generic Notice
type:4
> num:144 from LID:0x0001 GID:0xfe80000000000000,0x0002c902004013c2
> > Apr 12 18:41:06 [18007] -> __osm_trap_rcv_process_request: Received
Generic
> Notice type:0x04 num:144 Producer:1 from LID:0x0002 TID:0x000000000000000d
> > Apr 12 18:41:06 [18007] -> osm_report_notice: Reporting Generic Notice
type:4
> num:144 from LID:0x0002 GID:0xfe80000000000000,0x0002c9020040133a
> > Apr 12 18:42:25 [18007] -> __osm_trap_rcv_process_request: Received
Generic
> Notice type:0x04 num:144 Producer:1 from LID:0x0002 TID:0x000000000000000e
> > Apr 12 18:42:25 [18007] -> osm_report_notice: Reporting Generic Notice
type:4
> num:144 from LID:0x0002 GID:0xfe80000000000000,0x0002c9020040133a
> >
> > priority 15 server
> >
> > Apr 12 18:42:25 [4000] -> OpenSM Rev:openib-1.0.0
> > Apr 12 18:42:25 [4000] -> osm_opensm_init: Forcing single threaded
dispatcher.
> > Apr 12 18:42:25 [4000] -> osm_report_notice: Reporting Generic Notice
type:3
> num:66 from LID:0x0000 GID:0xfe80000000000000,0x0000000000000000
> > Apr 12 18:42:25 [4000] -> osm_report_notice: Reporting Generic Notice
type:3
> num:66 from LID:0x0000 GID:0xfe80000000000000,0x0000000000000000
> > Apr 12 18:42:25 [4000] -> osm_vendor_bind: Binding to port
0x2c9020040133a.
> > Apr 12 18:42:25 [4000] -> osm_vendor_bind: Binding to port
0x2c9020040133a.
> > Apr 12 18:42:25 [18007] -> osm_mcmr_rcv_leave_mgrp: ERR 1B25:Received an
> Invalid Delete Request.
> > Apr 12 18:42:25 [18007] -> osm_mcmr_rcv_leave_mgrp: ERR 1B25:Received an
> Invalid Delete Request.
> > Apr 12 18:42:25 [18007] -> osm_mcmr_rcv_leave_mgrp: ERR 1B25:Received an
> Invalid Delete Request.
> > Apr 12 18:42:25 [18007] -> osm_mcmr_rcv_leave_mgrp: ERR 1B25:Received an
> Invalid Delete Request.
> >
> > When I kill the priority 15 server however, the priority 0 server runs
> > amok with continous log messages like:
> >
> > Apr 12 18:44:28 [2400A] -> umad_receiver: send completed with
error(method=1
> attr=20) -- dropping.
> > Apr 12 18:44:28 [2400A] -> umad_receiver: send completed with
error(method=1
> attr=20) -- dropping.
> 
> Attribute 0x20 is SMInfo. This is just the SubnGet(SMInfo) from the
> priority 0 server failing (no matching SubnGetResp received) which is
> "normal" if you killed the priority 15 server.
> 
> Do the messages ever subside ?
> 
> > I assume that the handover to the priority 0 opensm hasn't worked
> > then.
> 
> This isn't really handover but that is another matter.
> You should be able to use the sminfo diag to see whether this SM has
> assumed the MASTER role.
> 
> -- Hal
> 
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20050413/41fbaa37/attachment.html>


More information about the general mailing list