Thanks for the suggestion Sasha! <br>
<br>
Our host stack does receive a rereregistration notice and does resubscribe all handlers at <br>
that point in time. At the time of the SM migration, our stack prints out some informational messages to <br>
confirm this: <br>
Jul 18 14:31:09 localhost kernel: Event IB_EVENT_CLIENT_REREGISTER occurred on port 1<br>
Jul 18 14:31:09 localhost kernel: OpemSM migrated, old SM LID=1 new SM LID=8<br>
<br>
And also confirmed in the SM logs that after the migration, the higher
priority SM is getting a subscription request for in-service trap: <br>
Jul 18 14:32:13 103550 [41E02960] -> osm_infr_rcv_process_set_method: Subscribe Request with QPN: 0x000001<br>Jul 18 14:32:13 103554 [41E02960] -> osm_infr_get_by_rec: [<br>
Jul 18 14:32:13 103558 [41E02960] -> __dump_all_informs: [<br>
Jul 18 14:32:13 103562 [41E02960] -> InformInfo dump:<br>
gid.....................0x0000000000000000 : 0x0000000000000000<br>
lid_range_begin.........0xFFFF<br>
lid_range_end...........0x0<br>
is_generic..............0x1<br>
subscribe...............0x0<br>
trap_type...............0x3<br>
trap_num................64<br>
qpn.....................0x000001<br>
resp_time_val...........0x0<br>
node_type...............0x000004<br>
Jul 18 14:32:13 103569 [41E02960] -> __dump_all_informs: ]<br>
<br>
It maybe a problem if the resubscription of the in-service handler
occurs after the in-service notice was forwarded, but I think the
problem is that there is never a notice that is forwared for the higher
priority SM port that is restored. Perhaps, neither SM (the lower
priority and higher priority one), generates an in-service trap because
of the timing gap between when the restored port is detected and
"marked" (i.e. added to new_ports_list) and when in-service traps are
generated for new ports. During SM migration, the lower priority
SM detects the new port, but the higher priority SM does the trap
generation (but it doesn't realize that it's own port is a new port and
thus doesn't generate a trap for it). <br>
<br>
Our host stack executes some functions when a port is restored (in our in-service subscription handler). <br>
Am I not supposed to receive an in-service trap for a restored port
that happens to be the Master SM, and instead execute these
actions with a client reregistration event? <br>
<br>
Thanks again for your help!<br>
Lan <br>
<br>
<br>
<br><div><span class="gmail_quote">On 7/25/07, <b class="gmail_sendername">Sasha Khapyorsky</b> <<a href="mailto:sashak@voltaire.com">sashak@voltaire.com</a>> wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Hi Lan,<br><br>On 09:57 Wed 25 Jul , lbt wrote:<br>> Hello,<br>><br>> I have been seeing a problem where a subscriber for in-service traps is not<br>> getting informed when the port of master openSM is restored (
i.e. causing an<br>> SM migration).<br>><br>> I have an IB subnet with 2 nodes running OpenSM , different priorities of<br>> course (OpenSM Rev:openib-2.0.5). I also have another node on the subnet<br>> that has subscribed for the forwarding of any IB_SA_GENERIC_TRAP_NUM_IN_SVC
<br>> trap events. I've been doing cable pull tests on the IB ports, to check if<br>> the in-service handler I have subscribed gets invoked when I restore the<br>> cable. I've noticed that everything works as expected (
i.e. my in-service<br>> handler is invoked) whenever I restore the cable on the lower priority SM IB<br>> port without ever touching the master SM port. But if I cause an SM<br>> migration, by restoring the port of the higher priority SM, the in-service
<br>> trap does not get generated as expected on a cable restore.<br>><br>> Steps to Reproduce:<br>> 1) Start with port to higher priority SM disconnected.<br>> 2) restore port cable on the higher priority SM
<br>> --> This causes an SM Migration as expected, SM's migration happens okay<br>> --> I expected the restoration of the higher priority SM to tit to also<br>> trigger an in-service trap as well and notify subscribers, but it doesn't
<br>> occur<br>><br>> I have collected debug messages log for both open SM's, and it appears that<br>> the reason is because:<br>> 1) in-service traps are generated based on what ports are added on the
<br>> Master SM's new_ports_list, but these traps are generated only after LID<br>> assignment<br>> 2) when the higher priority SM port is restored, the restored port gets<br>> added to the lower priority SM's new_ports_list (since it's still the Master
<br>> SM at that point in time)<br>> 3) the handover of Master SM from lower priority to higher priority SM<br>> occurs (before LID assignment and thus a chance for traps get generated for<br>> those ports on new_ports_list)
<br>> 4) the higher priority SM is now Master SM, but it has an empty<br>> new_ports_list, so no trap generated either<br>><br>> Does this look like a legitimate Open SM bug? Any feedback would be much<br>> appreciated, and if I can help further in any way please let me know .
<br><br>As far as I know when OpenSM (even old like 2.0.5) becomes master it<br>requests client to reregister SA related stuff (by setting this bit in<br>PortInfo).<br><br>Probably your port doesn't not support this (you could verify by seeing
<br>PortInfo:CapabilityMask - use 'smpquery portinfo <client-port-lid>') or<br>maybe your host stack doesn't do reregistration?<br><br>Anyway you could track this in the OpenSM code in osm_lid_mgr.c<br>__osm_lid_mgr_set_physp_pi() whenever client reregistration bit is set
<br>(with ib_port_info_set_client_rereg()) or not. Then we will know more<br>about this problem.<br><br>Sasha<br><br>><br>><br>> Subset of logs from lower priority SM during the cable restore of higher<br>> priority SM port:
<br>> ### Jul 18 14:31:56 614522 [41401960] -> __osm_trap_rcv_process_request:<br>> Received Generic Notice type:0x03 num:128 Producer:2 from LID:0x000A<br>> TID:0x00000016000012e1<br>> ### Jul 18 14:31:56 614823 [41401960] -> osm_state_mgr_process: Received
<br>> signal OSM_SIGNAL_SWEEP in state OSM_SM_STATE_IDLE<br>> ### 14:31:56 ******************** INITIATING HEAVY SWEEP<br>> **********************<br>> ### Jul 18 14:31:56 616887 [42803960] -> osm_state_mgr_process: Received
<br>> signal OSM_SIGNAL_NO_PENDING_TRANSACTIONS in state<br>> OSM_SM_STATE_SWEEP_HEAVY_SELF<br>> Jul 18 14:31:56 626078 [42803960] -> __osm_ni_rcv_process_new: Adding port<br>> GUID:0x00504501483e0000 to new_ports_list
<br>> Jul 18 14:31:56 626524 [42803960] -> osm_state_mgr_process: Received signal<br>> OSM_SIGNAL_CHANGE_DETECTED in state OSM_SM_STATE_SWEEP_HEAVY_SUBNET<br>> Jul 18 14:31:56 632630 [41E02960] -> osm_state_mgr_process: Received signal
<br>> OSM_SIGNAL_NO_PENDING_TRANSACTIONS in state OSM_SM_STATE_SWEEP_HEAVY_SUBNET<br>> 14:31:56 ********************* HEAVY SWEEP COMPLETE ***********************<br>> Jul 18 14:31:56 632773 [41E02960] -> osm_sm_state_mgr_process: Received
<br>> signal OSM_SM_SIGNAL_HANDOVER_SENT in state IB_SMINFO_STATE_MASTER###<br>> 14:31:56 ******************** ENTERING SM STANDBY STATE *******************<br>><br>> Subset of logs from higher priority SM during the cable restore of higher
<br>> priority SM port:<br>><br>> Jul 18 14:32:02 995600 [41401960] -> osm_sm_state_mgr_process: [<br>> Jul 18 14:32:02 995605 [41401960] -> osm_sm_state_mgr_process: Received<br>> signal OSM_SM_SIGNAL_DISCOVERY_COMPLETED in state
<br>> IB_SMINFO_STATE_DISCOVERING<br>> Jul 18 14:32:02 995609 [41401960] -> Entering MASTER state<br>> Jul 18 14:32:02 995888 [41401960] -> __osm_sm_state_mgr_master_msg:<br>> ******************** ENTERING SM MASTER STATE ********************
<br>> Jul 18 14:32:03 009014 [41401960] -> __osm_state_mgr_set_sm_lid_done_msg:<br>> **** SM LID ASSIGNMENT COMPLETE - STARTING SUBNET LID CONFIG *****<br>> Jul 18 14:32:03 024047 [41E02960] -> __osm_state_mgr_lid_assign_msg
<br>> ***** LID ASSIGNMENT COMPLETE - STARTING SWITCH TABLE CONFIG *****<br>> Jul 18 14:32:03 024052 [41E02960] -> __osm_state_mgr_report_new_ports: [<br>> ----> no in-service traps are generated and notices forwarded because there
<br>> are no ports on this list<br>> Jul 18 14:32:03 024057 [41E02960] -> __osm_state_mgr_report_new_ports: ]<br>><br>><br>> Thanks!<br>> Lan<br><br>> _______________________________________________
<br>> general mailing list<br>> <a href="mailto:general@lists.openfabrics.org">general@lists.openfabrics.org</a><br>> <a href="http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general">http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
</a><br>><br>> To unsubscribe, please visit <a href="http://openib.org/mailman/listinfo/openib-general">http://openib.org/mailman/listinfo/openib-general</a><br></blockquote></div><br>