[ofw][patch][ibal] make IBBUS start up synchronous.

Smith, Stan stan.smith at intel.com
Wed Mar 25 11:17:22 PDT 2009


________________________________
From: Leonid Keller [mailto:leonid at mellanox.co.il]
Sent: Wednesday, March 25, 2009 10:04 AM
To: Smith, Stan; ofw at lists.openfabrics.org
Subject: RE: [ofw][patch][ibal] make IBBUS start up synchronous.

And what timeout would you set ? 3 seconds ?
How does one understand 3 seconds is too long or too short? :)

I think the 3 seconds is based on experience with a specific HCA implementation.
What if the next generation HCA or another vendors HCA takes 5 seconds?
You have far more experience in this area than I, so I'm good with your solution.
BTW, will you be submitting patches similar to mlx4 for the mthca driver in the near future?

stan.


The expected time of PnP handling 300-400 msec, so i think, 3 seconds has to be enough.
One can change the fix, returning error, when after 3 seconds reference counter was not decreased.

You are right, one could reach the same goal using events.
For example, add an event to CI_CA object, which is a part of the event structure, and post the event at the end of the callback (__pnp_process_add_ca).


________________________________
From: Smith, Stan [mailto:stan.smith at intel.com]
Sent: Wednesday, March 25, 2009 6:23 PM
To: Leonid Keller; ofw at lists.openfabrics.org
Subject: RE: [ofw][patch][ibal] make IBBUS start up synchronous.

Hello Leonid,
  A small question. If the ADD_CA event does indeed signal completion of HCA startup, why not have fdo_start() wait on an HCA guid specific event (time-out enabled escape hatch) where the ADD_CA event callback would then trigger the HCA guid specific event thus completing fdo_start()?
If the event wait times out, then fdo_start() returns an error.

Bottom-line: over the years I have found waiting in spin loops to be problematic. How does one understand 3 seconds is too long or too short?

What am I not understanding about this synchronization problem?

thanks,

stan.



________________________________
From: ofw-bounces at lists.openfabrics.org [mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of Leonid Keller
Sent: Wednesday, March 25, 2009 7:02 AM
To: ofw at lists.openfabrics.org
Subject: [ofw][patch][ibal] make IBBUS start up synchronous.

IBAL start up has by design two stages.
  - initialization and preparing for getting IBAL PnP events (synchronous on fdo_start);
  - completing the start up upon receiving ADD_CA event (asynchronous, after fdo_start).

It means, that IBBUS driver notifies the completion of the driver start operation before it really has finished it.
It causes races in quick HCA disable/enable sequences.

This patch makes IBBUS start up synchronous by adding a wait in the first stage till the end of the second stage.


Index: core/al/kernel/al_pnp.c
===================================================================
--- core/al/kernel/al_pnp.c (revision 2055)
+++ core/al/kernel/al_pnp.c (working copy)
@@ -1152,6 +1152,7 @@
  cl_status_t   cl_status;
  al_pnp_ca_event_t *p_event_rec;
  ib_ca_attr_t  *p_old_ca_attr;
+ uint32_t   ref_cnt, prev_cnt, i;

  AL_ENTER( AL_DBG_PNP );

@@ -1232,9 +1233,18 @@
  }

  /* Queue the event to the async processing manager. */
- ref_al_obj( &gp_pnp->obj );
+ ref_cnt = ref_al_obj( &gp_pnp->obj );
  cl_async_proc_queue( gp_async_pnp_mgr, &p_event_rec->async_item );

+ /* wait up to 3 seconds for the end of event propagation
+ It is needed for enabling quick HCA disable/enable scenarios. */
+ prev_cnt = ref_cnt - 1;
+ for ( i = 0; i < 30; ++i ) {
+  if (ref_cnt <= prev_cnt)
+   break;
+  cl_thread_suspend(100);
+ }
+
  AL_EXIT( AL_DBG_PNP );
  return IB_SUCCESS;
 }
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20090325/29644239/attachment.html>


More information about the ofw mailing list