[ofw] Doing queries on subnet every 30 seconds

Smith, Stan stan.smith at intel.com
Tue Jun 7 14:39:03 PDT 2011


Hello all,
  The 'simple' IOC sweep-on-demand solution tested out to be valid using a Linux RHEL 5.3, OFED-1.5.1 SRP target(vdisks).
Specifically, perform an IOC sweep only when

1) requested (QUERY_DEVICE_RELATIONS for device 'IB Bus') or
2) a PORT_ACTIVE pnp event occurs.

The registry key 'IocPollInterval' value definitions have been expanded:

0 == no IOC sweeping/rescan.
1 == IOC sweep on demand ( QUERY_DEVICE_RELATIONS for device 'IB Bus', 'devcon rescan') or PORT_ACTIVE pnp event occurs.
 > 1 == IOC sweep every 'IocPollInterval' milliseconds (current behavior).

Testing consisted of loading SRP & IOU drivers on each Windows system prior to the SRP target coming online; systems without SRP drivers loaded did not see the SRP targets as expected.
Two Server 2008 R2 systems were used: one with the current IOC sweep every 30 seconds, the other system was IocPollInterval ==1 (sweep on demand).

OpenSM 3.3.9 was running on a separate Svr 2008 R2 (x86) system.

On each Windows system the 'Computer Management-->Storage Manager-->Disk Manager view was opened.

Once the Linux SRP targets were started (SRP targets (vdisks) exported: /dev/sdb1, /dev/sdb2 & /dev/sdb3.

The 30 sec sweeping system (SS) reported all 3 of the expected SRP targets within a few seconds, the IOC-sweep-on-demand (SOD) system did not register the 3 SRP targets until a 'device rescan' was forced via 'devcon.exe rescan'.

SRP target functionality was verified using the SOD system.

The Linux SRP targets were taken offline at the Linux box.

The sweeping Windows system reported the SRP targets had been removed within a few seconds, while the SOD system continued to display the SRP targets.
Once the SOD system had a forced an IOC rescan (devcon.exe rescan) the SRP targets were no longer displayed.

Initial IOC sweep on demand experiments demonstrate the feasibility of the code changes.
More testing needs to take place using more and different fabric IOC's which I do not have access to; I will continue SRP target experiments.

At this juncture, I would recommend the code changes be committed as the original IOC periodic sweep functionality is still available.
Furthermore the default IocPollInterval should be set == 1 (sweep on demand).


--- core/al/kernel/al_ioc_pnp.c	Tue Jun 07 13:41:54 2011
+++ core/al/kernel/al_ioc_pnp.c	Tue Jun 07 13:35:54 2011
@@ -294,7 +294,6 @@
 	cl_async_proc_item_t	async_item;
 	sweep_state_t			state;
 	ioc_pnp_svc_t			*p_svc;
-	atomic32_t				query_cnt;
 	cl_fmap_t				iou_map;
 
 }	ioc_sweep_results_t;
@@ -313,8 +312,12 @@
 ioc_pnp_mgr_t	*gp_ioc_pnp = NULL;
 uint32_t		g_ioc_query_timeout = 250;
 uint32_t		g_ioc_query_retries = 4;
-uint32_t		g_ioc_poll_interval = 30000;
-
+uint32_t		g_ioc_poll_interval = 1;
+					/* 0 == no IOC rescan
+					 * 1 == IOC rescan on demand (IB_PNP_SM_CHANGE, IB_PNP_PORT_ACTIVE,
+					 *			QUERY_DEVICE_RELATIONS for device 'IB Bus')
+					 * > 1 == rescan interval in millisecond units.
+					 */
 
 
 /******************************************************************************
@@ -1204,6 +1207,24 @@
 }
 
 
+void
+ioc_pnp_request_ioc_rescan(void)
+{
+	ib_api_status_t	status;
+
+	AL_ENTER( AL_DBG_PNP );
+
+	CL_ASSERT( gp_ioc_pnp );
+	if ( g_ioc_poll_interval == 1 && !gp_ioc_pnp->query_cnt )
+	{
+		AL_PRINT( TRACE_LEVEL_ERROR, AL_DBG_ERROR, ("Requesting IOC rescan\n") );
+		status = cl_timer_start( &gp_ioc_pnp->sweep_timer, 50 );
+		CL_ASSERT( status == CL_SUCCESS );
+	}
+	AL_EXIT( AL_DBG_PNP );
+}
+
+
 /*
  * PnP callback for port event notifications.
  */
@@ -1213,12 +1234,14 @@
 {
 	ib_api_status_t	status = IB_SUCCESS;
 	cl_status_t		cl_status;
+#if DBG
+	const char		*evt = ib_get_pnp_event_str( p_pnp_rec->pnp_event );
+#endif
 
 	AL_ENTER( AL_DBG_PNP );
 
 	AL_PRINT( TRACE_LEVEL_INFORMATION, AL_DBG_PNP,
-		("p_pnp_rec->pnp_event = 0x%x (%s)\n",
-		p_pnp_rec->pnp_event, ib_get_pnp_event_str( p_pnp_rec->pnp_event )) );
+		("p_pnp_rec->pnp_event = 0x%x (%s)\n", p_pnp_rec->pnp_event, evt) );
 
 	switch( p_pnp_rec->pnp_event )
 	{
@@ -1257,8 +1280,19 @@
 		((ioc_pnp_svc_t*)p_pnp_rec->context)->obj.pfn_destroy(
 			&((ioc_pnp_svc_t*)p_pnp_rec->context)->obj, NULL );
 		p_pnp_rec->context = NULL;
+		break;
+
+	case IB_PNP_IOU_ADD:
+	case IB_PNP_IOU_REMOVE:
+	case IB_PNP_IOC_ADD:
+	case IB_PNP_IOC_REMOVE:
+	case IB_PNP_IOC_PATH_ADD:
+	case IB_PNP_IOC_PATH_REMOVE:
+		AL_PRINT( TRACE_LEVEL_ERROR, AL_DBG_PNP, ("!Handled PNP Event %s\n",evt));
+		break;
 
 	default:
+		AL_PRINT( TRACE_LEVEL_ERROR, AL_DBG_ERROR, ("Ignored PNP Event %s\n",evt));
 		break;	/* Ignore other PNP events. */
 	}
 
@@ -2630,11 +2664,14 @@
 	__remove_ious( &old_ious );
 	CL_ASSERT( !cl_fmap_count( &old_ious ) );
 
-	/* Reset the sweep timer. */
-	if( g_ioc_poll_interval )
+	/* Reset the sweep timer.
+	 * 0 == No IOC polling.
+	 * 1 == IOC poll on demand.
+	 * > 1 == IOC poll every g_ioc_poll_interval milliseconds.
+	 */
+	if( g_ioc_poll_interval > 1)
 	{
-		status = cl_timer_start(
-			&gp_ioc_pnp->sweep_timer, g_ioc_poll_interval );
+		status = cl_timer_start( &gp_ioc_pnp->sweep_timer, g_ioc_poll_interval );
 		CL_ASSERT( status == CL_SUCCESS );
 	}
 
@@ -3045,8 +3082,7 @@
 	else
 	{
 		/* Report the IOU to all clients registered for IOU events. */
-		cl_qlist_find_from_head( &gp_ioc_pnp->iou_reg_list,
-			__notify_users, &event );
+		cl_qlist_find_from_head( &gp_ioc_pnp->iou_reg_list, __notify_users, &event );
 
 		/* Report IOCs - this will in turn report the paths. */
 		__add_iocs( p_iou, &p_iou->ioc_map, NULL );


--- core/bus/kernel/bus_port_mgr.c	Tue Jun 07 13:44:20 2011
+++ core/bus/kernel/bus_port_mgr.c	Tue Jun 07 13:39:10 2011
@@ -75,6 +75,7 @@
 
 
 extern pkey_array_t  g_pkeys;
+static pnp_port_active;
 
 /*
  * Function prototypes.
@@ -103,6 +104,9 @@
 port_mgr_port_remove(
 	IN				ib_pnp_port_rec_t*			p_pnp_rec );
 
+void
+ioc_pnp_request_ioc_rescan(void);
+
 static NTSTATUS
 port_start(
 	IN				DEVICE_OBJECT* const		p_dev_obj,
@@ -501,11 +505,17 @@
 		break;
 
 	case IB_PNP_PORT_REMOVE:
+		if (pnp_port_active > 0)
+			pnp_port_active--;
 		port_mgr_port_remove( (ib_pnp_port_rec_t*)p_pnp_rec );
 		break;
 
+	case IB_PNP_PORT_ACTIVE:
+		pnp_port_active++;
+		break;
+
 	default:
-		XBUS_PRINT( BUS_DBG_PNP, ("Unhandled PNP Event %s\n",
+		BUS_PRINT( BUS_DBG_PNP, ("Ignored PNP Event %s\n",
 					ib_get_pnp_event_str(p_pnp_rec->pnp_event) ));
 		break;
 	}
@@ -567,6 +577,15 @@
 							p_bfi->whoami, ca_guid, p_port_mgr) );
 	if (!p_port_mgr)
 		return STATUS_NO_SUCH_DEVICE;
+
+	if ( g_ioc_poll_interval == 1 && pnp_port_active
+		&& p_bfi->p_bus_ext->cl_ext.vfptr_pnp_po->identity
+		&& strcmp(p_bfi->p_bus_ext->cl_ext.vfptr_pnp_po->identity, "IB Bus") == 0 )
+	{
+		BUS_PRINT(BUS_DBG_PNP, ("***** device '%s' requesting IOC rescan\n",
+				p_bfi->p_bus_ext->cl_ext.vfptr_pnp_po->identity) );
+		ioc_pnp_request_ioc_rescan();
+	}
 
 	cl_mutex_acquire( &p_port_mgr->pdo_mutex );
 	status = bus_get_relations( &p_port_mgr->port_list, ca_guid, p_irp );

The pnp_port_active usage is about skipping an on-demand IOC sweep request before 'any' IB ports have come active; as in the current implementation the IOC sweep timer is not started until after the 1st IB port goes active.
Perhaps you could suggest a better solution?

--- hw/mlx4/kernel/hca/mlx4_hca.inx	Tue Jun 07 14:29:06 2011
+++ hw/mlx4/kernel/hca/mlx4_hca.inx	Tue Jun 07 13:15:48 2011
@@ -296,7 +296,11 @@
 HKR,"Parameters","SmiPollInterval",%REG_DWORD_NO_CLOBBER%,20000
 HKR,"Parameters","IocQueryTimeout",%REG_DWORD_NO_CLOBBER%,250
 HKR,"Parameters","IocQueryRetries",%REG_DWORD_NO_CLOBBER%,4
-HKR,"Parameters","IocPollInterval",%REG_DWORD_NO_CLOBBER%,30000
+
+; IocPollInterval: 0 == no ioc poll, 1 == poll on demand (device rescan)
+;   (> 1) poll every x milliseconds, 30000 (30 secs) previous default.
+HKR,"Parameters","IocPollInterval",%REG_DWORD_NO_CLOBBER%,1
+
 HKR,"Parameters","DebugFlags",%REG_DWORD%,0x80000000
 HKR,"Parameters","ReportPortNIC",%REG_DWORD%,1
 
--- hw/mthca/kernel/mthca.inx	Tue Jun 07 14:31:42 2011
+++ hw/mthca/kernel/mthca.inx	Tue Jun 07 13:15:20 2011
@@ -297,7 +297,11 @@
 HKR,"Parameters","SmiPollInterval",%REG_DWORD_NO_CLOBBER%,20000
 HKR,"Parameters","IocQueryTimeout",%REG_DWORD_NO_CLOBBER%,250
 HKR,"Parameters","IocQueryRetries",%REG_DWORD_NO_CLOBBER%,4
-HKR,"Parameters","IocPollInterval",%REG_DWORD_NO_CLOBBER%,30000
+
+; IocPollInterval: 0 == no ioc poll, 1 == poll on demand (device rescan)
+;   (> 1) poll every x milliseconds, 30000 (30 secs) previous default.
+HKR,"Parameters","IocPollInterval",%REG_DWORD_NO_CLOBBER%,1
+
 HKR,"Parameters","DebugFlags",%REG_DWORD%,0x80000000
 HKR,"Parameters","ReportPortNIC",%REG_DWORD%,1
 
Should 'IocPollInterval'  %REG_DWORD_NO_CLOBBER% be changed to %REG_DWORD% to prevent possible install confusion?

Thanks,

Stan.


>-----Original Message-----
>From: ofw-bounces at lists.openfabrics.org [mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of Smith, Stan
>Sent: Wednesday, June 01, 2011 5:36 PM
>To: Fab Tillier; Hefty, Sean; Tzachi Dar; ofw at lists.openfabrics.org
>Subject: Re: [ofw] Doing queries on subnet every 30 seconds
>
>Tzachi,
>  Upon further code review, there seems to be a rather simple solution which covers most concerns.
>
>The bus driver (ioc_manager) is coded such that when the IOC rescan routine is finished it restarts the IOC rescan timer if IocPollInterval > 0
>using IocPollInterval as the timer expiration value.
>Your solution (IocPollInterval = 0) prohibits starting the IOC rescan timer for all events, thus a new IOC/U will not be recognized; OK for
>most installations.
>
>To prohibit IOC scanning every 30 seconds and yet recognize a new IOC/IOU...... upon completion of an IOC rescan operation, the IOC
>rescan timer is not restarted?
>Currently IB_PNP_SM_CHANGE and IB_PNP_PORT_ACTIVE cause the IOC rescan timer to start and expire after 250 ms; no code change.
>Upon recognition of QUERY_DEVICE_RELATIONS for device 'IB Bus' the IOC rescan timer is started;  this would cover the 'devcon.exe
>rescan' case.
>
>BTW, the IOC rescan timer callback function is coded such that only a single instance of the IOC rescan function will run.
>
>To summarize:
>Do not automatically restart the IOC rescan timer (IocPollInterval) after completing an IOC rescan.
>Restart the IOC rescan timer upon recognition of QUERY_DEVICE_RELATIONS for device 'IB Bus'.
>
>Simple, minor code changes?
>
>What have I missed?
>
>Thanks,
>
>Stan.
>_______________________________________________
>ofw mailing list
>ofw at lists.openfabrics.org
>http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw
-------------- next part --------------
A non-text attachment was scrubbed...
Name: al_ioc_pnp.c.patch
Type: application/octet-stream
Size: 3376 bytes
Desc: al_ioc_pnp.c.patch
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20110607/40908ab9/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bus_port_mgr.c.patch
Type: application/octet-stream
Size: 1513 bytes
Desc: bus_port_mgr.c.patch
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20110607/40908ab9/attachment-0001.obj>


More information about the ofw mailing list