[openib-general][PATCH][kdapl]: evd upcall policy implementation

Guy German guyg at voltaire.com
Thu Aug 11 16:29:57 PDT 2005


Hi,
Sorry for the bad indentation. Please see my answers
starting with gg:

-----Original Message-----
From: James Lentini [mailto:jlentini at netapp.com]
Sent: Thu 8/11/2005 7:59 PM
To: Guy German
Cc: openib-general at openib.org
Subject: Re: [openib-general][PATCH][kdapl]: evd upcall policy implementation
 

Hi Guy,

Comments/questions below:

On Thu, 11 Aug 2005, Guy German wrote:

> Sorry for resending it - the former mail did not have a subject
> 
> This patch allows the dapl consumer to control the evd upcall policy.
> Some consumers (e.g. ISER) receives one upcall, disable
> the upcall policy, and retrieve the rest of the events from a 
> kernel_thread, via dat_evd_dequeue.
> This fashion of work improves performance by saving the context 
> switching that is involved in many upcalls.
> If the consumer does not behave that way and leaves the upcall policy 
> enabled at all times (e.g. kdapltest), the behavior will stay the same and 
> the consumer will get an upcall for each event.
> 
> Signed-off-by: Guy German <guyg at voltaire.com>
> 
> Index: dapl_evd.c
> ===================================================================
> --- dapl_evd.c	(revision 3056)
> +++ dapl_evd.c	(working copy)
> @@ -38,28 +38,39 @@
>  #include "dapl_ring_buffer_util.h"
>  
>  /*
> - * DAPL Internal routine to trigger the specified CNO.
> - * Called by the callback of some EVD associated with the CNO.

Thanks for catch these CNO references. I thought I had removed them 
all.

> + * DAPL Internal routine to trigger the callback of the EVD
>   */
>  static void dapl_evd_upcall_trigger(struct dapl_evd *evd)
>  {
>  	int status = 0;
>  	struct dat_event event;
> +	unsigned long flags;

For flags, we use flags member in the dapl_common structure. Take a 
look at the call to spin_lock_irqsave() in dapl_evd_get_event() for an 
example. We use the flags in the structure because the EVD code takes 
the spin lock in one function and releases it in another.

gg:
OK

>  
> -	/* Only process events if there is an enabled callback function. */
> -	if ((evd->upcall.upcall_func == (DAT_UPCALL_FUNC) NULL) ||
> -	    (evd->upcall_policy == DAT_UPCALL_DISABLE)) {
> +	
> +	/* The function is not re-entrant (change when implementing DAT_UPCALL_MANY)*/

Why is this function not re-entrant? For reference, here is how I 
would define re-entrant:

http://en.wikipedia.org/wiki/Reentrant
http://foldoc.doc.ic.ac.uk/foldoc/foldoc.cgi?re-entrant

gg:
The function can not be entered twice at the same time, because 
when the upcall is at the hands of the consumer he can disable 
the upcall policy and if it is entered twice, there is a chance 
the consumer will get another upcall after disabling 
the upcall policy.


> +	if (evd->is_triggered)
>  		return;
> -	}

Why check the value here? Is it only for the efficiency of not taking 
the spin lock when is_triggered is 1?

gg:
No. you can't take the spin_lock here because this can cause 
a dead lockin the case the function calls itself from 
dat_evd_dequeue, on a uni-proccessor machines.

>  
> -	for (;;) {
> +	spin_lock_irqsave (&evd->common.lock, flags);
> +	if (evd->is_triggered) {
> +		spin_unlock_irqrestore (&evd->common.lock, flags);
> +		return;
> +	}
> +	evd->is_triggered = 1;
> +	spin_unlock_irqrestore (&evd->common.lock, flags);
> +	/* Only process events if there is an enabled callback function */
> +	while ((evd->upcall.upcall_func != (DAT_UPCALL_FUNC)NULL) &&
> +	       (evd->upcall_policy != DAT_UPCALL_TEARDOWN) &&
> +	       (evd->upcall_policy != DAT_UPCALL_DISABLE)) {
>  		status = dapl_evd_dequeue((struct dat_evd *)evd, &event);
> -		if (0 != status) 
> -			return;
> -
> +		if (status)
> +			break;
>  		evd->upcall.upcall_func(evd->upcall.instance_data, &event,
>  					FALSE);
>  	}
> +	evd->is_triggered = 0;
> +
> +	return;
>  }
>  
>  static void dapl_evd_eh_print_wc(struct ib_wc *wc)
> @@ -820,24 +831,19 @@ static void dapl_evd_dto_callback(struct
>  	 * This function does not dequeue from the CQ; only the consumer
>  	 * can do that. Instead, it wakes up waiters if any exist.
>  	 * It rearms the completion only if completions should always occur
> -	 * (specifically if a CNO is associated with the EVD and the
> -	 * EVD is enabled).
>  	 */
> -
> -	if (state == DAPL_EVD_STATE_OPEN && 
> -	    evd->upcall_policy != DAT_UPCALL_DISABLE) {
> -		/*
> -		 * Re-enable callback, *then* trigger.
> -		 * This guarantees we won't miss any events.
> -		 */
> -		status = ib_req_notify_cq(evd->cq, IB_CQ_NEXT_COMP);
> -		if (0 != status) 
> -			(void)dapl_evd_post_async_error_event(
> -				evd->common.owner_ia->async_error_evd,
> -				DAT_ASYNC_ERROR_PROVIDER_INTERNAL_ERROR,
> -				evd->common.owner_ia);
> -
> +	
> +	if (state == DAPL_EVD_STATE_OPEN) {
>  		dapl_evd_upcall_trigger(evd);
> +		if ((evd->upcall_policy != DAT_UPCALL_TEARDOWN) &&
> +		    (evd->upcall_policy != DAT_UPCALL_DISABLE)) {
> +			status = ib_req_notify_cq(evd->cq, IB_CQ_NEXT_COMP);
> +			if (0 != status) 
> +				(void)dapl_evd_post_async_error_event(
> +					evd->common.owner_ia->async_error_evd,
> +					DAT_ASYNC_ERROR_PROVIDER_INTERNAL_ERROR,
> +					evd->common.owner_ia);
> +		}


You changed the order in which the CQ upcall is enabled and the kDAPL 
upcall is made. It used to be:

 enable CQ upcall
 call kDAPL upcall

you are proposing

 call kDAPL upcall
 enable CQ upcall

I think your proposed order contains a race condition. Specifically if 
a work completion occurs after dapl_evd_upcall_trigger() 
returns but before the CQ upcall is re-enabled with 
ib_req_notify_cq(), no upcall will occur for the completion.

Do you agree?

gg:
You need to enable the CQ upcall only if the consumer did
not change his upcall policy, while in upcall context. In 
the first case you will create a situation where the cq is 
enabled, while the consumers doesn't want any upcalls.
In most real world application dapl_evd_upcall_trigger()
will return with upcall policy disabled and there will be 
no need to alarm the cq upcall - i.e the consumer would
dequeue the rest of the events himself.

I see the race you talk about. It is relevent to kdapltest.
Maybe we can check if there are pending events after 
enabling CQ upcall, and if there are - call 
dapl_evd_upcall_trigger() again. What do you think ?


>  	}
>  	dapl_dbg_log(DAPL_DBG_TYPE_RTN, "%s() returns\n", __func__);
>  }
> @@ -890,7 +896,7 @@ int dapl_evd_internal_create(struct dapl
>  
>  		/* reset the qlen in the attributes, it may have changed */
>  		evd->qlen = evd->cq->cqe;
> -
> +		evd->is_triggered = 0;

This should be done in dapl_evd_alloc.

>  		status = ib_req_notify_cq(evd->cq, IB_CQ_NEXT_COMP);
>  
>  		if (status != 0)
> @@ -1035,15 +1041,41 @@ int dapl_evd_modify_upcall(struct dat_ev
>  			   const struct dat_upcall_object *upcall)
>  {
>  	struct dapl_evd *evd;
> -
> -	dapl_dbg_log(DAPL_DBG_TYPE_API, "dapl_modify_upcall(%p)\n", evd_handle);
> +	int status = 0;
> +	int pending_events;
> +	unsigned long flags;

See my comment above about he flags.

>  
>  	evd = (struct dapl_evd *)evd_handle;
> +	dapl_dbg_log (DAPL_DBG_TYPE_API, "%s: (evd=%p) set to %d\n",
> +			__func__, evd_handle, upcall_policy);

The idea was to make the DAPL_DBG_TYPE_API prints look like a 
debugger stack trace. The following would be keeping with the other 
print statements:

gg:
I thought it would make it a bit more user friendly :) sometimes
the consumers use those debug prints and they don't want to dwell 
in the kdapl code too much in order to understand what they 
are reading ...

+	dapl_dbg_log (DAPL_DBG_TYPE_API, "%s(%p, %d, %p)\n",
+		      __func__, evd_handle, upcall_policy, upcall);


>  
> +	spin_lock_irqsave(&evd->common.lock, flags);
> +	if ((upcall_policy != DAT_UPCALL_TEARDOWN) &&
> +	    (upcall_policy != DAT_UPCALL_DISABLE)) {

Why not let the consumer setup the upcall when it disabled? That seems 
like the only safe time to modify it.

gg:
The consumer needs and can change the poilcy to disable and enable.
The only time he is not allowed to change the policy to enable (in 
this implementation) is when there are still pending events in the 
queue. This is to solve a race where the consumer dequeued 
all the events and changed the policy to enable, but there were 
other event/s that came just before calling dat_evd_modufy_upcall.
In this case dat_evd_modufy_upcall to enable would fail and the 
consumer would keep dequeue-ing the events, without loosing his
context.

> +		pending_events = dapl_rbuf_count(&evd->pending_event_queue);

I don't understand this restriction either. Please explain.

gg:
explained above

> +		if (pending_events) {
> +			dapl_dbg_log (DAPL_DBG_TYPE_WARN,
> +				"%s: (evd %p) there are still %d pending "
> +				"events in the queue - policy stays disabled\n",
> +				__func__, evd_handle, pending_events);
> +			status = -EBUSY;
> +			goto bail;
> +		}
> +		if (evd->evd_flags & DAT_EVD_DTO_FLAG) {
> +			status = ib_req_notify_cq(evd->cq, IB_CQ_NEXT_COMP);

Why do we need to re-enable the CQ upcall?

gg:
If the consumer returned from the evd_upcall with upcall policy 
"disabled" the CQ upcall is not enabled. So this is the only 
place it is done.

> +			if (status) {
> +				printk(KERN_ERR "%s: dapls_ib_completion_notify"
> +					" failed (status=0x%x) \n",__func__,
> +					status);

Let's use dapl_dbg_log instead of printk. We can also update the text 
of the message to

 "%s: ib_req_notify_cq failed: %X\n"

gg: 
OK

> +				goto bail;
> +			}
> +		}
> +	}
>  	evd->upcall_policy = upcall_policy;
>  	evd->upcall = *upcall;
> -
> -	return 0;
> +bail:
> +	spin_unlock_irqrestore(&evd->common.lock, flags);
> +	return status;
>  }
>  
>  int dapl_evd_post_se(struct dat_evd *evd_handle, const struct dat_event *event)
> @@ -1076,7 +1108,7 @@ int dapl_evd_post_se(struct dat_evd *evd
>  					      event->event_data.
>  					      software_event_data.pointer);
>  
> -      bail:
> +bail:
>  	return status;
>  }
>  
> @@ -1124,7 +1156,7 @@ int dapl_evd_dequeue(struct dat_evd *evd
>  	}
>  
>  	spin_unlock_irqrestore(&evd->common.lock, evd->common.flags);
> -      bail:
> +bail:
>  	dapl_dbg_log(DAPL_DBG_TYPE_RTN,
>  		     "dapl_evd_dequeue () returns 0x%x\n", status);
>  




More information about the general mailing list