[openib-general][PATCH][kdapl]: evd upcall policy implementation

Thu Aug 11 09:59:58 PDT 2005

Hi Guy,

Comments/questions below:

On Thu, 11 Aug 2005, Guy German wrote:

> Sorry for resending it - the former mail did not have a subject
> 
> This patch allows the dapl consumer to control the evd upcall policy.
> Some consumers (e.g. ISER) receives one upcall, disable
> the upcall policy, and retrieve the rest of the events from a 
> kernel_thread, via dat_evd_dequeue.
> This fashion of work improves performance by saving the context 
> switching that is involved in many upcalls.
> If the consumer does not behave that way and leaves the upcall policy 
> enabled at all times (e.g. kdapltest), the behavior will stay the same and 
> the consumer will get an upcall for each event.
> 
> Signed-off-by: Guy German <guyg at voltaire.com>
> 
> Index: dapl_evd.c
> ===================================================================
> --- dapl_evd.c	(revision 3056)
> +++ dapl_evd.c	(working copy)
> @@ -38,28 +38,39 @@
>  #include "dapl_ring_buffer_util.h"
>  
>  /*
> - * DAPL Internal routine to trigger the specified CNO.
> - * Called by the callback of some EVD associated with the CNO.

Thanks for catch these CNO references. I thought I had removed them 
all.

> + * DAPL Internal routine to trigger the callback of the EVD
>   */
>  static void dapl_evd_upcall_trigger(struct dapl_evd *evd)
>  {
>  	int status = 0;
>  	struct dat_event event;
> +	unsigned long flags;

For flags, we use flags member in the dapl_common structure. Take a 
look at the call to spin_lock_irqsave() in dapl_evd_get_event() for an 
example. We use the flags in the structure because the EVD code takes 
the spin lock in one function and releases it in another.

>  
> -	/* Only process events if there is an enabled callback function. */
> -	if ((evd->upcall.upcall_func == (DAT_UPCALL_FUNC) NULL) ||
> -	    (evd->upcall_policy == DAT_UPCALL_DISABLE)) {
> +	
> +	/* The function is not re-entrant (change when implementing DAT_UPCALL_MANY)*/

Why is this function not re-entrant? For reference, here is how I 
would define re-entrant:

http://en.wikipedia.org/wiki/Reentrant
http://foldoc.doc.ic.ac.uk/foldoc/foldoc.cgi?re-entrant

> +	if (evd->is_triggered)
>  		return;
> -	}

Why check the value here? Is it only for the efficiency of not taking 
the spin lock when is_triggered is 1?

>  
> -	for (;;) {
> +	spin_lock_irqsave (&evd->common.lock, flags);
> +	if (evd->is_triggered) {
> +		spin_unlock_irqrestore (&evd->common.lock, flags);
> +		return;
> +	}
> +	evd->is_triggered = 1;
> +	spin_unlock_irqrestore (&evd->common.lock, flags);
> +	/* Only process events if there is an enabled callback function */
> +	while ((evd->upcall.upcall_func != (DAT_UPCALL_FUNC)NULL) &&
> +	       (evd->upcall_policy != DAT_UPCALL_TEARDOWN) &&
> +	       (evd->upcall_policy != DAT_UPCALL_DISABLE)) {
>  		status = dapl_evd_dequeue((struct dat_evd *)evd, &event);
> -		if (0 != status) 
> -			return;
> -
> +		if (status)
> +			break;
>  		evd->upcall.upcall_func(evd->upcall.instance_data, &event,
>  					FALSE);
>  	}
> +	evd->is_triggered = 0;
> +
> +	return;
>  }
>  
>  static void dapl_evd_eh_print_wc(struct ib_wc *wc)
> @@ -820,24 +831,19 @@ static void dapl_evd_dto_callback(struct
>  	 * This function does not dequeue from the CQ; only the consumer
>  	 * can do that. Instead, it wakes up waiters if any exist.
>  	 * It rearms the completion only if completions should always occur
> -	 * (specifically if a CNO is associated with the EVD and the
> -	 * EVD is enabled).
>  	 */
> -
> -	if (state == DAPL_EVD_STATE_OPEN && 
> -	    evd->upcall_policy != DAT_UPCALL_DISABLE) {
> -		/*
> -		 * Re-enable callback, *then* trigger.
> -		 * This guarantees we won't miss any events.
> -		 */
> -		status = ib_req_notify_cq(evd->cq, IB_CQ_NEXT_COMP);
> -		if (0 != status) 
> -			(void)dapl_evd_post_async_error_event(
> -				evd->common.owner_ia->async_error_evd,
> -				DAT_ASYNC_ERROR_PROVIDER_INTERNAL_ERROR,
> -				evd->common.owner_ia);
> -
> +	
> +	if (state == DAPL_EVD_STATE_OPEN) {
>  		dapl_evd_upcall_trigger(evd);
> +		if ((evd->upcall_policy != DAT_UPCALL_TEARDOWN) &&
> +		    (evd->upcall_policy != DAT_UPCALL_DISABLE)) {
> +			status = ib_req_notify_cq(evd->cq, IB_CQ_NEXT_COMP);
> +			if (0 != status) 
> +				(void)dapl_evd_post_async_error_event(
> +					evd->common.owner_ia->async_error_evd,
> +					DAT_ASYNC_ERROR_PROVIDER_INTERNAL_ERROR,
> +					evd->common.owner_ia);
> +		}

You changed the order in which the CQ upcall is enabled and the kDAPL 
upcall is made. It used to be:

 enable CQ upcall
 call kDAPL upcall

you are proposing

 call kDAPL upcall
 enable CQ upcall

I think your proposed order contains a race condition. Specifically if 
a work completion occurs after dapl_evd_upcall_trigger() 
returns but before the CQ upcall is re-enabled with 
ib_req_notify_cq(), no upcall will occur for the completion.

Do you agree?

>  	}
>  	dapl_dbg_log(DAPL_DBG_TYPE_RTN, "%s() returns\n", __func__);
>  }
> @@ -890,7 +896,7 @@ int dapl_evd_internal_create(struct dapl
>  
>  		/* reset the qlen in the attributes, it may have changed */
>  		evd->qlen = evd->cq->cqe;
> -
> +		evd->is_triggered = 0;

This should be done in dapl_evd_alloc.

>  		status = ib_req_notify_cq(evd->cq, IB_CQ_NEXT_COMP);
>  
>  		if (status != 0)
> @@ -1035,15 +1041,41 @@ int dapl_evd_modify_upcall(struct dat_ev
>  			   const struct dat_upcall_object *upcall)
>  {
>  	struct dapl_evd *evd;
> -
> -	dapl_dbg_log(DAPL_DBG_TYPE_API, "dapl_modify_upcall(%p)\n", evd_handle);
> +	int status = 0;
> +	int pending_events;
> +	unsigned long flags;

See my comment above about he flags.

>  
>  	evd = (struct dapl_evd *)evd_handle;
> +	dapl_dbg_log (DAPL_DBG_TYPE_API, "%s: (evd=%p) set to %d\n",
> +			__func__, evd_handle, upcall_policy);

The idea was to make the DAPL_DBG_TYPE_API prints look like a 
debugger stack trace. The following would be keeping with the other 
print statements:

+	dapl_dbg_log (DAPL_DBG_TYPE_API, "%s(%p, %d, %p)\n",
+		      __func__, evd_handle, upcall_policy, upcall);

>  
> +	spin_lock_irqsave(&evd->common.lock, flags);
> +	if ((upcall_policy != DAT_UPCALL_TEARDOWN) &&
> +	    (upcall_policy != DAT_UPCALL_DISABLE)) {

Why not let the consumer setup the upcall when it disabled? That seems 
like the only safe time to modify it.

> +		pending_events = dapl_rbuf_count(&evd->pending_event_queue);

I don't understand this restriction either. Please explain.

> +		if (pending_events) {
> +			dapl_dbg_log (DAPL_DBG_TYPE_WARN,
> +				"%s: (evd %p) there are still %d pending "
> +				"events in the queue - policy stays disabled\n",
> +				__func__, evd_handle, pending_events);
> +			status = -EBUSY;
> +			goto bail;
> +		}
> +		if (evd->evd_flags & DAT_EVD_DTO_FLAG) {
> +			status = ib_req_notify_cq(evd->cq, IB_CQ_NEXT_COMP);

Why do we need to re-enable the CQ upcall?

> +			if (status) {
> +				printk(KERN_ERR "%s: dapls_ib_completion_notify"
> +					" failed (status=0x%x) \n",__func__,
> +					status);

Let's use dapl_dbg_log instead of printk. We can also update the text 
of the message to

 "%s: ib_req_notify_cq failed: %X\n"

> +				goto bail;
> +			}
> +		}
> +	}
>  	evd->upcall_policy = upcall_policy;
>  	evd->upcall = *upcall;
> -
> -	return 0;
> +bail:
> +	spin_unlock_irqrestore(&evd->common.lock, flags);
> +	return status;
>  }
>  
>  int dapl_evd_post_se(struct dat_evd *evd_handle, const struct dat_event *event)
> @@ -1076,7 +1108,7 @@ int dapl_evd_post_se(struct dat_evd *evd
>  					      event->event_data.
>  					      software_event_data.pointer);
>  
> -      bail:
> +bail:
>  	return status;
>  }
>  
> @@ -1124,7 +1156,7 @@ int dapl_evd_dequeue(struct dat_evd *evd
>  	}
>  
>  	spin_unlock_irqrestore(&evd->common.lock, evd->common.flags);
> -      bail:
> +bail:
>  	dapl_dbg_log(DAPL_DBG_TYPE_RTN,
>  		     "dapl_evd_dequeue () returns 0x%x\n", status);
>