Hi Sasha,<br>
<br>
Yes, the problem seems to appear only when there is an SM migration. I
receive in-service notices for other ports, as long as there is no SM
migration occurring. <br>
<br>
Thanks,<br>
Lan<br><br><div><span class="gmail_quote">On 7/26/07, <b class="gmail_sendername">Sasha Khapyorsky</b> <<a href="mailto:sashak@voltaire.com">sashak@voltaire.com</a>> wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
On 12:37 Thu 26 Jul , lbt wrote:<br>> Thanks for the suggestion Sasha!<br>><br>> Our host stack does receive a rereregistration notice and does resubscribe<br>> all handlers at<br>> that point in time. At the time of the SM migration, our stack prints out
<br>> some informational messages to<br>> confirm this:<br>> Jul 18 14:31:09 localhost kernel: Event IB_EVENT_CLIENT_REREGISTER occurred<br>> on port 1<br>> Jul 18 14:31:09 localhost kernel: OpemSM migrated, old SM LID=1 new SM LID=8
<br>><br>> And also confirmed in the SM logs that after the migration, the higher<br>> priority SM is getting a subscription request for in-service trap:<br>> Jul 18 14:32:13 103550 [41E02960] -> osm_infr_rcv_process_set_method:
<br>> Subscribe Request with QPN: 0x000001<br>> Jul 18 14:32:13 103554 [41E02960] -> osm_infr_get_by_rec: [<br>> Jul 18 14:32:13 103558 [41E02960] -> __dump_all_informs: [<br>> Jul 18 14:32:13 103562 [41E02960] -> InformInfo dump:
<br>>
gid.....................0x0000000000000000 :<br>> 0x0000000000000000<br>>
lid_range_begin.........0xFFFF<br>>
lid_range_end...........0x0<br>>
is_generic..............0x1<br>>
subscribe...............0x0<br>>
trap_type...............0x3<br>>
trap_num................64<br>>
qpn.....................0x000001<br>>
resp_time_val...........0x0<br>>
node_type...............0x000004<br>> Jul 18 14:32:13 103569 [41E02960] -> __dump_all_informs: ]<br>><br>> It maybe a problem if the resubscription of the in-service handler occurs<br>> after the in-service notice was forwarded, but I think the problem is that
<br>> there is never a notice that is forwared for the higher priority SM port<br>> that is restored.<br><br>And after OpenSM migration, did you receive in-service notices for<br>another ports? Does the problem happen only in migration time?
<br><br>> Perhaps, neither SM (the lower priority and higher<br>> priority one), generates an in-service trap because of the timing gap<br>> between when the restored port is detected and "marked" (
i.e. added to<br>> new_ports_list) and when in-service traps are generated for new ports.<br>> During SM migration, the lower priority SM detects the new port, but the<br>> higher priority SM does the trap generation (but it doesn't realize that
<br>> it's own port is a new port and thus doesn't generate a trap for it).<br>><br>> Our host stack executes some functions when a port is restored (in our<br>> in-service subscription handler).<br>
> Am I not supposed to receive an in-service trap for a restored port that<br>> happens to be the Master SM,<br><br>Yes, I guess you are.<br><br>> and instead execute these actions with a<br>> client reregistration event?
<br><br>Client reregistration request is not suitable here - SM can ask for<br>client reregistration at any time (in practice OpenSM now does it only<br>when enters MASTER state, but it is also optional).<br><br>Sasha<br>
<br>><br>> Thanks again for your help!<br>> Lan<br>><br>><br>><br>> On 7/25/07, Sasha Khapyorsky <<a href="mailto:sashak@voltaire.com">sashak@voltaire.com</a>> wrote:<br>> ><br>> > Hi Lan,
<br>> ><br>> > On 09:57 Wed 25 Jul , lbt wrote:<br>> > > Hello,<br>> > ><br>> > > I have been seeing a problem where a subscriber for in-service traps is<br>> > not<br>> > > getting informed when the port of master openSM is restored (
i.e.<br>> > causing an<br>> > > SM migration).<br>> > ><br>> > > I have an IB subnet with 2 nodes running OpenSM , different priorities<br>> > of<br>> > > course (OpenSM Rev:
openib-2.0.5). I also have another node on the<br>> > subnet<br>> > > that has subscribed for the forwarding of any<br>> > IB_SA_GENERIC_TRAP_NUM_IN_SVC<br>> > > trap events. I've been doing cable pull tests on the IB ports, to check
<br>> > if<br>> > > the in-service handler I have subscribed gets invoked when I restore<br>> > the<br>> > > cable. I've noticed that everything works as expected ( i.e. my<br>> > in-service
<br>> > > handler is invoked) whenever I restore the cable on the lower priority<br>> > SM IB<br>> > > port without ever touching the master SM port. But if I cause an SM<br>> > > migration, by restoring the port of the higher priority SM, the
<br>> > in-service<br>> > > trap does not get generated as expected on a cable restore.<br>> > ><br>> > > Steps to Reproduce:<br>> > > 1) Start with port to higher priority SM disconnected.
<br>> > > 2) restore port cable on the higher priority SM<br>> > > --> This causes an SM Migration as expected, SM's migration happens<br>> > okay<br>> > > --> I expected the restoration of the higher priority SM to tit to also
<br>> > > trigger an in-service trap as well and notify subscribers, but it<br>> > doesn't<br>> > > occur<br>> > ><br>> > > I have collected debug messages log for both open SM's, and it appears
<br>> > that<br>> > > the reason is because:<br>> > > 1) in-service traps are generated based on what ports are added on the<br>> > > Master SM's new_ports_list, but these traps are generated only after
<br>> > LID<br>> > > assignment<br>> > > 2) when the higher priority SM port is restored, the restored port gets<br>> > > added to the lower priority SM's new_ports_list (since it's still the
<br>> > Master<br>> > > SM at that point in time)<br>> > > 3) the handover of Master SM from lower priority to higher priority<br>> > SM<br>> > > occurs (before LID assignment and thus a chance for traps get generated
<br>> > for<br>> > > those ports on new_ports_list)<br>> > > 4) the higher priority SM is now Master SM, but it has an empty<br>> > > new_ports_list, so no trap generated either<br>> > >
<br>> > > Does this look like a legitimate Open SM bug? Any feedback would be<br>> > much<br>> > > appreciated, and if I can help further in any way please let me know .<br>> ><br>> > As far as I know when OpenSM (even old like
2.0.5) becomes master it<br>> > requests client to reregister SA related stuff (by setting this bit in<br>> > PortInfo).<br>> ><br>> > Probably your port doesn't not support this (you could verify by seeing
<br>> > PortInfo:CapabilityMask - use 'smpquery portinfo <client-port-lid>') or<br>> > maybe your host stack doesn't do reregistration?<br>> ><br>> > Anyway you could track this in the OpenSM code in osm_lid_mgr.c
<br>> > __osm_lid_mgr_set_physp_pi() whenever client reregistration bit is set<br>> > (with ib_port_info_set_client_rereg()) or not. Then we will know more<br>> > about this problem.<br>> ><br>> > Sasha
<br>> ><br>> > ><br>> > ><br>> > > Subset of logs from lower priority SM during the cable restore of<br>> > higher<br>> > > priority SM port:<br>> > > ### Jul 18 14:31:56 614522 [41401960] ->
<br>> > __osm_trap_rcv_process_request:<br>> > > Received Generic Notice type:0x03 num:128 Producer:2 from LID:0x000A<br>> > > TID:0x00000016000012e1<br>> > > ### Jul 18 14:31:56 614823 [41401960] -> osm_state_mgr_process:
<br>> > Received<br>> > > signal OSM_SIGNAL_SWEEP in state OSM_SM_STATE_IDLE<br>> > > ### 14:31:56 ******************** INITIATING HEAVY SWEEP<br>> > > **********************<br>> > > ### Jul 18 14:31:56 616887 [42803960] -> osm_state_mgr_process:
<br>> > Received<br>> > > signal OSM_SIGNAL_NO_PENDING_TRANSACTIONS in state<br>> > > OSM_SM_STATE_SWEEP_HEAVY_SELF<br>> > > Jul 18 14:31:56 626078 [42803960] -> __osm_ni_rcv_process_new: Adding
<br>> > port<br>> > > GUID:0x00504501483e0000 to new_ports_list<br>> > > Jul 18 14:31:56 626524 [42803960] -> osm_state_mgr_process: Received<br>> > signal<br>> > > OSM_SIGNAL_CHANGE_DETECTED in state OSM_SM_STATE_SWEEP_HEAVY_SUBNET
<br>> > > Jul 18 14:31:56 632630 [41E02960] -> osm_state_mgr_process: Received<br>> > signal<br>> > > OSM_SIGNAL_NO_PENDING_TRANSACTIONS in state<br>> > OSM_SM_STATE_SWEEP_HEAVY_SUBNET<br>
> > > 14:31:56 ********************* HEAVY SWEEP COMPLETE<br>> > ***********************<br>> > > Jul 18 14:31:56 632773 [41E02960] -> osm_sm_state_mgr_process: Received<br>> > > signal OSM_SM_SIGNAL_HANDOVER_SENT in state IB_SMINFO_STATE_MASTER###
<br>> > > 14:31:56 ******************** ENTERING SM STANDBY STATE<br>> > *******************<br>> > ><br>> > > Subset of logs from higher priority SM during the cable restore of<br>> > higher
<br>> > > priority SM port:<br>> > ><br>> > > Jul 18 14:32:02 995600 [41401960] -> osm_sm_state_mgr_process: [<br>> > > Jul 18 14:32:02 995605 [41401960] -> osm_sm_state_mgr_process: Received
<br>> > > signal OSM_SM_SIGNAL_DISCOVERY_COMPLETED in state<br>> > > IB_SMINFO_STATE_DISCOVERING<br>> > > Jul 18 14:32:02 995609 [41401960] -> Entering MASTER state<br>> > > Jul 18 14:32:02 995888 [41401960] -> __osm_sm_state_mgr_master_msg:
<br>> > > ******************** ENTERING SM MASTER STATE ********************<br>> > > Jul 18 14:32:03 009014 [41401960] -><br>> > __osm_state_mgr_set_sm_lid_done_msg:<br>> > > **** SM LID ASSIGNMENT COMPLETE - STARTING SUBNET LID CONFIG *****
<br>> > > Jul 18 14:32:03 024047 [41E02960] -> __osm_state_mgr_lid_assign_msg<br>> > > ***** LID ASSIGNMENT COMPLETE - STARTING SWITCH TABLE CONFIG *****<br>> > > Jul 18 14:32:03 024052 [41E02960] -> __osm_state_mgr_report_new_ports:
<br>> > [<br>> > > ----> no in-service traps are generated and notices forwarded because<br>> > there<br>> > > are no ports on this list<br>> > > Jul 18 14:32:03 024057 [41E02960] -> __osm_state_mgr_report_new_ports:
<br>> > ]<br>> > ><br>> > ><br>> > > Thanks!<br>> > > Lan<br>> ><br>> > > _______________________________________________<br>> > > general mailing list<br>
> > > <a href="mailto:general@lists.openfabrics.org">general@lists.openfabrics.org</a><br>> > > <a href="http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general">http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
</a><br>> > ><br>> > > To unsubscribe, please visit<br>> > <a href="http://openib.org/mailman/listinfo/openib-general">http://openib.org/mailman/listinfo/openib-general</a><br>> ><br></blockquote>
</div><br>