[ofa-general] Lost in-service traps during Open SM migration
lbt
transter at gmail.com
Wed Jul 25 09:57:50 PDT 2007
Hello,
I have been seeing a problem where a subscriber for in-service traps is not
getting informed when the port of master openSM is restored (i.e. causing an
SM migration).
I have an IB subnet with 2 nodes running OpenSM , different priorities of
course (OpenSM Rev:openib-2.0.5). I also have another node on the subnet
that has subscribed for the forwarding of any IB_SA_GENERIC_TRAP_NUM_IN_SVC
trap events. I've been doing cable pull tests on the IB ports, to check if
the in-service handler I have subscribed gets invoked when I restore the
cable. I've noticed that everything works as expected ( i.e. my in-service
handler is invoked) whenever I restore the cable on the lower priority SM IB
port without ever touching the master SM port. But if I cause an SM
migration, by restoring the port of the higher priority SM, the in-service
trap does not get generated as expected on a cable restore.
Steps to Reproduce:
1) Start with port to higher priority SM disconnected.
2) restore port cable on the higher priority SM
--> This causes an SM Migration as expected, SM's migration happens okay
--> I expected the restoration of the higher priority SM to tit to also
trigger an in-service trap as well and notify subscribers, but it doesn't
occur
I have collected debug messages log for both open SM's, and it appears that
the reason is because:
1) in-service traps are generated based on what ports are added on the
Master SM's new_ports_list, but these traps are generated only after LID
assignment
2) when the higher priority SM port is restored, the restored port gets
added to the lower priority SM's new_ports_list (since it's still the Master
SM at that point in time)
3) the handover of Master SM from lower priority to higher priority SM
occurs (before LID assignment and thus a chance for traps get generated for
those ports on new_ports_list)
4) the higher priority SM is now Master SM, but it has an empty
new_ports_list, so no trap generated either
Does this look like a legitimate Open SM bug? Any feedback would be much
appreciated, and if I can help further in any way please let me know .
Subset of logs from lower priority SM during the cable restore of higher
priority SM port:
### Jul 18 14:31:56 614522 [41401960] -> __osm_trap_rcv_process_request:
Received Generic Notice type:0x03 num:128 Producer:2 from LID:0x000A
TID:0x00000016000012e1
### Jul 18 14:31:56 614823 [41401960] -> osm_state_mgr_process: Received
signal OSM_SIGNAL_SWEEP in state OSM_SM_STATE_IDLE
### 14:31:56 ******************** INITIATING HEAVY SWEEP
**********************
### Jul 18 14:31:56 616887 [42803960] -> osm_state_mgr_process: Received
signal OSM_SIGNAL_NO_PENDING_TRANSACTIONS in state
OSM_SM_STATE_SWEEP_HEAVY_SELF
Jul 18 14:31:56 626078 [42803960] -> __osm_ni_rcv_process_new: Adding port
GUID:0x00504501483e0000 to new_ports_list
Jul 18 14:31:56 626524 [42803960] -> osm_state_mgr_process: Received signal
OSM_SIGNAL_CHANGE_DETECTED in state OSM_SM_STATE_SWEEP_HEAVY_SUBNET
Jul 18 14:31:56 632630 [41E02960] -> osm_state_mgr_process: Received signal
OSM_SIGNAL_NO_PENDING_TRANSACTIONS in state OSM_SM_STATE_SWEEP_HEAVY_SUBNET
14:31:56 ********************* HEAVY SWEEP COMPLETE ***********************
Jul 18 14:31:56 632773 [41E02960] -> osm_sm_state_mgr_process: Received
signal OSM_SM_SIGNAL_HANDOVER_SENT in state IB_SMINFO_STATE_MASTER###
14:31:56 ******************** ENTERING SM STANDBY STATE *******************
Subset of logs from higher priority SM during the cable restore of higher
priority SM port:
Jul 18 14:32:02 995600 [41401960] -> osm_sm_state_mgr_process: [
Jul 18 14:32:02 995605 [41401960] -> osm_sm_state_mgr_process: Received
signal OSM_SM_SIGNAL_DISCOVERY_COMPLETED in state
IB_SMINFO_STATE_DISCOVERING
Jul 18 14:32:02 995609 [41401960] -> Entering MASTER state
Jul 18 14:32:02 995888 [41401960] -> __osm_sm_state_mgr_master_msg:
******************** ENTERING SM MASTER STATE ********************
Jul 18 14:32:03 009014 [41401960] -> __osm_state_mgr_set_sm_lid_done_msg:
**** SM LID ASSIGNMENT COMPLETE - STARTING SUBNET LID CONFIG *****
Jul 18 14:32:03 024047 [41E02960] -> __osm_state_mgr_lid_assign_msg
***** LID ASSIGNMENT COMPLETE - STARTING SWITCH TABLE CONFIG *****
Jul 18 14:32:03 024052 [41E02960] -> __osm_state_mgr_report_new_ports: [
----> no in-service traps are generated and notices forwarded because there
are no ports on this list
Jul 18 14:32:03 024057 [41E02960] -> __osm_state_mgr_report_new_ports: ]
Thanks!
Lan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070725/fd37b043/attachment.html>
More information about the general
mailing list