<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=US-ASCII">
<META NAME="Generator" CONTENT="MS Exchange Server version 5.5.2654.45">
<TITLE>RE: [openib-general] OpenSM (again)</TITLE>
</HEAD>
<BODY>
<P><FONT SIZE=2>FYI: OpenSM implements master handover in a "lazy" or "less intrusive" manner:</FONT>
</P>
<P><FONT SIZE=2>OpenSM will only handoff a subnet to the new master on a heavy sweep sequence. </FONT>
<BR><FONT SIZE=2>So if you start an SM and then start one with higher priority - the handoff will not happen unless there was some change in the subnet (trap or switch "change bit").</FONT></P>
<P><FONT SIZE=2>The main reason for this behavior is the concept of "light sweep" that minimizes the discovery to checking of "change bits" and now also "irresponsive ports". So the new SM is not even discovered by the SM. </FONT></P>
<P><FONT SIZE=2>The benefit is that as long as there is no change in the subnet the active SM does not transfer the ownership to the new one - which has an overhead on the entire subnet</FONT></P>
<P><FONT SIZE=2>(client re-registration or even LID changes).</FONT>
</P>
<P><FONT SIZE=2>This behavior is compliant as the spec says:</FONT>
<BR><FONT SIZE=2>C14-60.2.1: If a Master SM finds another Master SM with lower priority (or</FONT>
<BR><FONT SIZE=2>same priority and higher GUID) it shall ensure that it is the highest priority</FONT>
<BR><FONT SIZE=2>(or same priority and lower GUID) on the subnet, and if so it shall wait for</FONT>
<BR><FONT SIZE=2>the other Master (or Masters) to relinquish control if its portion of the</FONT>
<BR><FONT SIZE=2>subnet.</FONT>
<BR><FONT SIZE=2>C14-61.2.2: If a Master SM determines that a lower priority Master SM</FONT>
<BR><FONT SIZE=2>has not performed a handover within a vendor-specific time period, then</FONT>
<BR><FONT SIZE=2>it shall not change the state of the subnet.</FONT>
<BR><FONT SIZE=2> </FONT>
<BR><FONT SIZE=2>Eitan Zahavi</FONT>
<BR><FONT SIZE=2>Design Technology Director</FONT>
<BR><FONT SIZE=2>Mellanox Technologies LTD</FONT>
<BR><FONT SIZE=2>Tel:+972-4-9097208</FONT>
<BR><FONT SIZE=2>Fax:+972-4-9593245</FONT>
<BR><FONT SIZE=2>P.O. Box 586 Yokneam 20692 ISRAEL</FONT>
</P>
<BR>
<P><FONT SIZE=2>> -----Original Message-----</FONT>
<BR><FONT SIZE=2>> From: Hal Rosenstock [<A HREF="mailto:halr@voltaire.com">mailto:halr@voltaire.com</A>]</FONT>
<BR><FONT SIZE=2>> Sent: Tuesday, April 12, 2005 8:00 PM</FONT>
<BR><FONT SIZE=2>> To: rf@q-leap.de</FONT>
<BR><FONT SIZE=2>> Cc: openib-general@openib.org</FONT>
<BR><FONT SIZE=2>> Subject: Re: [openib-general] OpenSM (again)</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> On Tue, 2005-04-12 at 12:46, Roland Fehrenbacher wrote:</FONT>
<BR><FONT SIZE=2>> > Hal> SM election occurs per high priority low GUID. So if you</FONT>
<BR><FONT SIZE=2>> > Hal> don't care which SM is the master than you don't need to do</FONT>
<BR><FONT SIZE=2>> > Hal> anything. If you want a specific order (and it is not in GUID</FONT>
<BR><FONT SIZE=2>> > Hal> order) then you need to specify priority.</FONT>
<BR><FONT SIZE=2>> ></FONT>
<BR><FONT SIZE=2>> > Ok. I tried this, specifying priority 0 on one server, and priority 15</FONT>
<BR><FONT SIZE=2>> > on another one. I assume priority 15, will be the master.</FONT>
<BR><FONT SIZE=2>> > If I first start the priority 0 opensm, and then the priority 15 one,</FONT>
<BR><FONT SIZE=2>> > things look normal: Log excerpts</FONT>
<BR><FONT SIZE=2>> ></FONT>
<BR><FONT SIZE=2>> > priority 0 server</FONT>
<BR><FONT SIZE=2>> ></FONT>
<BR><FONT SIZE=2>> > Apr 12 18:41:06 [4000] -> OpenSM Rev:openib-1.0.0</FONT>
<BR><FONT SIZE=2>> > Apr 12 18:41:06 [4000] -> osm_opensm_init: Forcing single threaded dispatcher.</FONT>
<BR><FONT SIZE=2>> > Apr 12 18:41:06 [4000] -> osm_report_notice: Reporting Generic Notice type:3</FONT>
<BR><FONT SIZE=2>> num:66 from LID:0x0000 GID:0xfe80000000000000,0x0000000000000000</FONT>
<BR><FONT SIZE=2>> > Apr 12 18:41:06 [4000] -> osm_report_notice: Reporting Generic Notice type:3</FONT>
<BR><FONT SIZE=2>> num:66 from LID:0x0000 GID:0xfe80000000000000,0x0000000000000000</FONT>
<BR><FONT SIZE=2>> > Apr 12 18:41:06 [4000] -> osm_vendor_bind: Binding to port 0x2c902004013c2.</FONT>
<BR><FONT SIZE=2>> > Apr 12 18:41:06 [4000] -> osm_vendor_bind: Binding to port 0x2c902004013c2.</FONT>
<BR><FONT SIZE=2>> > Apr 12 18:41:06 [18007] -> osm_mcmr_rcv_leave_mgrp: ERR 1B25:Received an</FONT>
<BR><FONT SIZE=2>> Invalid Delete Request.</FONT>
<BR><FONT SIZE=2>> > Apr 12 18:41:06 [18007] -> osm_mcmr_rcv_leave_mgrp: ERR 1B25:Received an</FONT>
<BR><FONT SIZE=2>> Invalid Delete Request.</FONT>
<BR><FONT SIZE=2>> > Apr 12 18:41:06 [18007] -> osm_mcmr_rcv_leave_mgrp: ERR 1B25:Received an</FONT>
<BR><FONT SIZE=2>> Invalid Delete Request.</FONT>
<BR><FONT SIZE=2>> > Apr 12 18:41:06 [18007] -> osm_mcmr_rcv_leave_mgrp: ERR 1B25:Received an</FONT>
<BR><FONT SIZE=2>> Invalid Delete Request.</FONT>
<BR><FONT SIZE=2>> > Apr 12 18:41:06 [18007] -> __osm_trap_rcv_process_request: Received Generic</FONT>
<BR><FONT SIZE=2>> Notice type:0x04 num:144 Producer:1 from LID:0x0001 TID:0x0000000000000011</FONT>
<BR><FONT SIZE=2>> > Apr 12 18:41:06 [18007] -> osm_report_notice: Reporting Generic Notice type:4</FONT>
<BR><FONT SIZE=2>> num:144 from LID:0x0001 GID:0xfe80000000000000,0x0002c902004013c2</FONT>
<BR><FONT SIZE=2>> > Apr 12 18:41:06 [18007] -> __osm_trap_rcv_process_request: Received Generic</FONT>
<BR><FONT SIZE=2>> Notice type:0x04 num:144 Producer:1 from LID:0x0002 TID:0x000000000000000d</FONT>
<BR><FONT SIZE=2>> > Apr 12 18:41:06 [18007] -> osm_report_notice: Reporting Generic Notice type:4</FONT>
<BR><FONT SIZE=2>> num:144 from LID:0x0002 GID:0xfe80000000000000,0x0002c9020040133a</FONT>
<BR><FONT SIZE=2>> > Apr 12 18:42:25 [18007] -> __osm_trap_rcv_process_request: Received Generic</FONT>
<BR><FONT SIZE=2>> Notice type:0x04 num:144 Producer:1 from LID:0x0002 TID:0x000000000000000e</FONT>
<BR><FONT SIZE=2>> > Apr 12 18:42:25 [18007] -> osm_report_notice: Reporting Generic Notice type:4</FONT>
<BR><FONT SIZE=2>> num:144 from LID:0x0002 GID:0xfe80000000000000,0x0002c9020040133a</FONT>
<BR><FONT SIZE=2>> ></FONT>
<BR><FONT SIZE=2>> > priority 15 server</FONT>
<BR><FONT SIZE=2>> ></FONT>
<BR><FONT SIZE=2>> > Apr 12 18:42:25 [4000] -> OpenSM Rev:openib-1.0.0</FONT>
<BR><FONT SIZE=2>> > Apr 12 18:42:25 [4000] -> osm_opensm_init: Forcing single threaded dispatcher.</FONT>
<BR><FONT SIZE=2>> > Apr 12 18:42:25 [4000] -> osm_report_notice: Reporting Generic Notice type:3</FONT>
<BR><FONT SIZE=2>> num:66 from LID:0x0000 GID:0xfe80000000000000,0x0000000000000000</FONT>
<BR><FONT SIZE=2>> > Apr 12 18:42:25 [4000] -> osm_report_notice: Reporting Generic Notice type:3</FONT>
<BR><FONT SIZE=2>> num:66 from LID:0x0000 GID:0xfe80000000000000,0x0000000000000000</FONT>
<BR><FONT SIZE=2>> > Apr 12 18:42:25 [4000] -> osm_vendor_bind: Binding to port 0x2c9020040133a.</FONT>
<BR><FONT SIZE=2>> > Apr 12 18:42:25 [4000] -> osm_vendor_bind: Binding to port 0x2c9020040133a.</FONT>
<BR><FONT SIZE=2>> > Apr 12 18:42:25 [18007] -> osm_mcmr_rcv_leave_mgrp: ERR 1B25:Received an</FONT>
<BR><FONT SIZE=2>> Invalid Delete Request.</FONT>
<BR><FONT SIZE=2>> > Apr 12 18:42:25 [18007] -> osm_mcmr_rcv_leave_mgrp: ERR 1B25:Received an</FONT>
<BR><FONT SIZE=2>> Invalid Delete Request.</FONT>
<BR><FONT SIZE=2>> > Apr 12 18:42:25 [18007] -> osm_mcmr_rcv_leave_mgrp: ERR 1B25:Received an</FONT>
<BR><FONT SIZE=2>> Invalid Delete Request.</FONT>
<BR><FONT SIZE=2>> > Apr 12 18:42:25 [18007] -> osm_mcmr_rcv_leave_mgrp: ERR 1B25:Received an</FONT>
<BR><FONT SIZE=2>> Invalid Delete Request.</FONT>
<BR><FONT SIZE=2>> ></FONT>
<BR><FONT SIZE=2>> > When I kill the priority 15 server however, the priority 0 server runs</FONT>
<BR><FONT SIZE=2>> > amok with continous log messages like:</FONT>
<BR><FONT SIZE=2>> ></FONT>
<BR><FONT SIZE=2>> > Apr 12 18:44:28 [2400A] -> umad_receiver: send completed with error(method=1</FONT>
<BR><FONT SIZE=2>> attr=20) -- dropping.</FONT>
<BR><FONT SIZE=2>> > Apr 12 18:44:28 [2400A] -> umad_receiver: send completed with error(method=1</FONT>
<BR><FONT SIZE=2>> attr=20) -- dropping.</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> Attribute 0x20 is SMInfo. This is just the SubnGet(SMInfo) from the</FONT>
<BR><FONT SIZE=2>> priority 0 server failing (no matching SubnGetResp received) which is</FONT>
<BR><FONT SIZE=2>> "normal" if you killed the priority 15 server.</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> Do the messages ever subside ?</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> > I assume that the handover to the priority 0 opensm hasn't worked</FONT>
<BR><FONT SIZE=2>> > then.</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> This isn't really handover but that is another matter.</FONT>
<BR><FONT SIZE=2>> You should be able to use the sminfo diag to see whether this SM has</FONT>
<BR><FONT SIZE=2>> assumed the MASTER role.</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> -- Hal</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> _______________________________________________</FONT>
<BR><FONT SIZE=2>> openib-general mailing list</FONT>
<BR><FONT SIZE=2>> openib-general@openib.org</FONT>
<BR><FONT SIZE=2>> <A HREF="http://openib.org/mailman/listinfo/openib-general" TARGET="_blank">http://openib.org/mailman/listinfo/openib-general</A></FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> To unsubscribe, please visit <A HREF="http://openib.org/mailman/listinfo/openib-general" TARGET="_blank">http://openib.org/mailman/listinfo/openib-general</A></FONT>
</P>
</BODY>
</HTML>