[ofw][patch][ibal] make IBBUS start up synchronous.

Leonid Keller leonid at mellanox.co.il
Wed Mar 25 11:27:09 PDT 2009


> BTW, will you be submitting patches similar to mlx4 for the mthca
driver in the near future?
Which ones do you mean ?


________________________________

	From: Smith, Stan [mailto:stan.smith at intel.com] 
	Sent: Wednesday, March 25, 2009 8:17 PM
	To: Leonid Keller; ofw at lists.openfabrics.org
	Subject: RE: [ofw][patch][ibal] make IBBUS start up synchronous.
	
	
________________________________

	From: Leonid Keller [mailto:leonid at mellanox.co.il] 
	Sent: Wednesday, March 25, 2009 10:04 AM
	To: Smith, Stan; ofw at lists.openfabrics.org
	Subject: RE: [ofw][patch][ibal] make IBBUS start up synchronous.
	
	
	And what timeout would you set ? 3 seconds ?
	How does one understand 3 seconds is too long or too short? :) 
	 
	I think the 3 seconds is based on experience with a specific HCA
implementation.
	What if the next generation HCA or another vendors HCA takes 5
seconds?
	You have far more experience in this area than I, so I'm good
with your solution.
	BTW, will you be submitting patches similar to mlx4 for the
mthca driver in the near future?
	 
	stan.
	 
	 
	The expected time of PnP handling 300-400 msec, so i think, 3
seconds has to be enough.
	One can change the fix, returning error, when after 3 seconds
reference counter was not decreased.
	 
	You are right, one could reach the same goal using events.
	For example, add an event to CI_CA object, which is a part of
the event structure, and post the event at the end of the callback
(__pnp_process_add_ca).
	 


________________________________

		From: Smith, Stan [mailto:stan.smith at intel.com] 
		Sent: Wednesday, March 25, 2009 6:23 PM
		To: Leonid Keller; ofw at lists.openfabrics.org
		Subject: RE: [ofw][patch][ibal] make IBBUS start up
synchronous.
		
		
		Hello Leonid,
		  A small question. If the ADD_CA event does indeed
signal completion of HCA startup, why not have fdo_start() wait on an
HCA guid specific event (time-out enabled escape hatch) where the ADD_CA
event callback would then trigger the HCA guid specific event thus
completing fdo_start()?
		If the event wait times out, then fdo_start() returns an
error.
		 
		Bottom-line: over the years I have found waiting in spin
loops to be problematic. How does one understand 3 seconds is too long
or too short?
		 
		What am I not understanding about this synchronization
problem?
		 
		thanks,
		 
		stan.
		 
		 

________________________________

		From: ofw-bounces at lists.openfabrics.org
[mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of Leonid Keller
		Sent: Wednesday, March 25, 2009 7:02 AM
		To: ofw at lists.openfabrics.org
		Subject: [ofw][patch][ibal] make IBBUS start up
synchronous.
		
		
		IBAL start up has by design two stages.
		  - initialization and preparing for getting IBAL PnP
events (synchronous on fdo_start);
		  - completing the start up upon receiving ADD_CA event
(asynchronous, after fdo_start).
		 
		It means, that IBBUS driver notifies the completion of
the driver start operation before it really has finished it.
		It causes races in quick HCA disable/enable sequences.
		 
		This patch makes IBBUS start up synchronous by adding a
wait in the first stage till the end of the second stage.
		 
		 
		Index: core/al/kernel/al_pnp.c
	
===================================================================
		--- core/al/kernel/al_pnp.c (revision 2055)
		+++ core/al/kernel/al_pnp.c (working copy)
		@@ -1152,6 +1152,7 @@
		  cl_status_t   cl_status;
		  al_pnp_ca_event_t *p_event_rec;
		  ib_ca_attr_t  *p_old_ca_attr;
		+ uint32_t   ref_cnt, prev_cnt, i;
		 
		  AL_ENTER( AL_DBG_PNP );
		 
		@@ -1232,9 +1233,18 @@
		  }
		 
		  /* Queue the event to the async processing manager. */
		- ref_al_obj( &gp_pnp->obj );
		+ ref_cnt = ref_al_obj( &gp_pnp->obj );
		  cl_async_proc_queue( gp_async_pnp_mgr,
&p_event_rec->async_item );
		 
		+ /* wait up to 3 seconds for the end of event
propagation 
		+ It is needed for enabling quick HCA disable/enable
scenarios. */
		+ prev_cnt = ref_cnt - 1;
		+ for ( i = 0; i < 30; ++i ) {
		+  if (ref_cnt <= prev_cnt)
		+   break;
		+  cl_thread_suspend(100);
		+ }
		+
		  AL_EXIT( AL_DBG_PNP );
		  return IB_SUCCESS;
		 }
		

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20090325/46701e25/attachment.html>


More information about the ofw mailing list