[openib-general][PATCH][kdapl]: evd upcall policy implementation

Fri Aug 12 15:05:52 PDT 2005

Hi James,

Im writing you back, from my web mail, so sorry again
for the format (I will use "gg:" prefix again for my
answer :)

On Fri, 12 Aug 2005, Guy German wrote:
> > -	/* Only process events if there is an enabled callback function. */
> > -	if ((evd->upcall.upcall_func == (DAT_UPCALL_FUNC) NULL) ||
> > -	    (evd->upcall_policy == DAT_UPCALL_DISABLE)) {
> > +	
> > +	/* The function is not re-entrant (change when implementing DAT_UPCALL_MANY)*/
> 
> Why is this function not re-entrant? For reference, here is how I 
> would define re-entrant:
> 
> http://en.wikipedia.org/wiki/Reentrant
> http://foldoc.doc.ic.ac.uk/foldoc/foldoc.cgi?re-entrant
> 
> gg:
> The function can not be entered twice at the same time, because 
> when the upcall is at the hands of the consumer he can disable 
> the upcall policy and if it is entered twice, there is a chance 
> the consumer will get another upcall after disabling 
> the upcall policy.

I don't see the flow of control that would result in the scenario you 
describe. This piece of code

	spin_lock_irqsave (&evd->common.lock, flags);
        if (evd->is_triggered) {
                spin_unlock_irqrestore (&evd->common.lock, flags);
                return;
        }
        evd->is_triggered = 1;
        spin_unlock_irqrestore (&evd->common.lock, flags);

ensures that only one thread can be making upcalls at a time. 

gg: 
The change I did in the function assures that the function would not 
be entered twice at the same time, from 2 (or more) contexts 
(by adding the spin_lock).
The remark mentions that this change, however, does not support 
DAT_UPCALL_MANY, which, by my understanding, require upcalls 
to be called simultaneously, hence the function to be re-entrant 
(and not protected by a spin lock that prevents entering the function)

> 
> 
> > +	if (evd->is_triggered)
> >  		return;
> > -	}
> 
> Why check the value here? Is it only for the efficiency of not taking 
> the spin lock when is_triggered is 1?
> 
> gg:
> No. you can't take the spin_lock here because this can cause 
> a dead lockin the case the function calls itself from 
> dat_evd_dequeue, on a uni-proccessor machines.

Can you elaborate on this? Do you mean that the thread that performs 
the upcall in dapl_evd_upcall_trigger() can be used by the consumer to 
call dapl_evd_dequeue()? If so, I don't see the flow of control that 
begins in dapl_evd_dequeue() and reaches dapl_evd_upcall_trigger().

gg:
The protection there is from the case, where dapl_evd_upcall_trigger
calls dapl_evd_dequeue that calls dapl_evd_upcall_trigger again
(recursively). This happened when there was a bad DTO completion and
CONN_EVENT_BROKEN was synthesized. Now, I saw that this part was 
removed from dapl_evd_wc_to_event.
I still think that we should leave this protection, for those kinds of cases, 
that might be implemented in the future.

That leaves me with a question: what did happened to 
DAT_CONNECTION_EVENT_BROKEN? 

> > @@ -820,24 +831,19 @@ static void dapl_evd_dto_callback(struct
> >  	 * This function does not dequeue from the CQ; only the consumer
> >  	 * can do that. Instead, it wakes up waiters if any exist.
> >  	 * It rearms the completion only if completions should always occur
> > -	 * (specifically if a CNO is associated with the EVD and the
> > -	 * EVD is enabled).
> >  	 */
> > -
> > -	if (state == DAPL_EVD_STATE_OPEN && 
> > -	    evd->upcall_policy != DAT_UPCALL_DISABLE) {
> > -		/*
> > -		 * Re-enable callback, *then* trigger.
> > -		 * This guarantees we won't miss any events.
> > -		 */
> > -		status = ib_req_notify_cq(evd->cq, IB_CQ_NEXT_COMP);
> > -		if (0 != status) 
> > -			(void)dapl_evd_post_async_error_event(
> > -				evd->common.owner_ia->async_error_evd,
> > -				DAT_ASYNC_ERROR_PROVIDER_INTERNAL_ERROR,
> > -				evd->common.owner_ia);
> > -
> > +	
> > +	if (state == DAPL_EVD_STATE_OPEN) {
> >  		dapl_evd_upcall_trigger(evd);
> > +		if ((evd->upcall_policy != DAT_UPCALL_TEARDOWN) &&
> > +		    (evd->upcall_policy != DAT_UPCALL_DISABLE)) {
> > +			status = ib_req_notify_cq(evd->cq, IB_CQ_NEXT_COMP);
> > +			if (0 != status) 
> > +				(void)dapl_evd_post_async_error_event(
> > +					evd->common.owner_ia->async_error_evd,
> > +					DAT_ASYNC_ERROR_PROVIDER_INTERNAL_ERROR,
> > +					evd->common.owner_ia);
> > +		}
> 
> 
> You changed the order in which the CQ upcall is enabled and the kDAPL 
> upcall is made. It used to be:
> 
>  enable CQ upcall
>  call kDAPL upcall
> 
> you are proposing
> 
>  call kDAPL upcall
>  enable CQ upcall
> 
> I think your proposed order contains a race condition. Specifically if 
> a work completion occurs after dapl_evd_upcall_trigger() 
> returns but before the CQ upcall is re-enabled with 
> ib_req_notify_cq(), no upcall will occur for the completion.
> 
> Do you agree?
> 
> gg:
> You need to enable the CQ upcall only if the consumer did
> not change his upcall policy, while in upcall context. In 
> the first case you will create a situation where the cq is 
> enabled, while the consumers doesn't want any upcalls.

Correct, but this can be hidden so that the consumer does not receive 
the upcall.

gg:
So what does dapl would do with the DTO event once it got it ?

> In most real world application dapl_evd_upcall_trigger()
> will return with upcall policy disabled and there will be 
> no need to alarm the cq upcall - i.e the consumer would
> dequeue the rest of the events himself.
> 
> I see the race you talk about. It is relevent to kdapltest.
> Maybe we can check if there are pending events after 
> enabling CQ upcall, and if there are - call 
> dapl_evd_upcall_trigger() again. What do you think ?

The problem we are dealing with is that DAPL upcalls behave 
differently from IB upcalls. DAPL upcalls are enabled until they are 
disabled while IB upcalls are "one shots".

The approach taken in the current implementation is to always enable 
the IB upcalls and determine in the DAPL provider if the consumer's 
upcall should be invoked. 

gg:
This is not good enough for some consumers (e.g. ISER), and it is
not implementing the DAPL spec, which deals with upcall policies.

You are proposing a shift away from that approach. If we do that, we 
need to preserve the original semantics. 

gg:
I think that the implementation proposed preserves the current 
semantics. If the consumer doesn't change the upcall policy - it
stays enabled and everything stays the same (kdapltest is still
working). I think you had a good point about the race over there,
and that can be fixed.

> I'll ask the DAT Collaborative from some clarification on 
> the meaning of the different upcall policy flags.

> One final item. To be consistent with your design, CQ upcalls should 
> be selectively enabled in dapl_evd_internal_create().

gg:
Why ? the initial state is that the upcalls are enabled, like it
is today. Only if the consumer chooses to disable the upcall, he 
calls dat_evd_modify_upcall.

> >  	evd = (struct dapl_evd *)evd_handle;
> > +	dapl_dbg_log (DAPL_DBG_TYPE_API, "%s: (evd=%p) set to %d\n",
> > +			__func__, evd_handle, upcall_policy);
> 
> The idea was to make the DAPL_DBG_TYPE_API prints look like a 
> debugger stack trace. The following would be keeping with the other 
> print statements:
> 
> gg:
> I thought it would make it a bit more user friendly :) sometimes
> the consumers use those debug prints and they don't want to dwell 
> in the kdapl code too much in order to understand what they 
> are reading ...

You need a fair amount of familiarity with the code to know what a 
message that says "dapl_evd_modify_upcall: (evd=dbbe4e58) set to 2" 
means.

If you'd like to add the parameter names (e.g. evd=%p, 
upcall_policy=...) that is fine. 

gg:
Sure. Thats good enough.

> I think the function call format is  better, 
> because a user familiar with the function 
> signatures will know what each of the fields means. 

gg:
I think that if someone is writing a code over dapl, for the
first time, he has enough on his head, beside reviewing dapls 
implementation code.
It is easier to debug your own code with helpful debug prints, from
the lower layer, like: "upcall_policy=1" / "upcall_policy=o" / 
"connecting to 192.168.10.10"  DTO completion info ... etc..

> >  
> > +	spin_lock_irqsave(&evd->common.lock, flags);
> > +	if ((upcall_policy != DAT_UPCALL_TEARDOWN) &&
> > +	    (upcall_policy != DAT_UPCALL_DISABLE)) {
> 
> Why not let the consumer setup the upcall when it disabled? That seems 
> like the only safe time to modify it.
> 
> gg:
> The consumer needs and can change the poilcy to disable and enable.
> The only time he is not allowed to change the policy to enable (in 
> this implementation) is when there are still pending events in the 
> queue. 

You mean when there are *no* pending events in the queue.

gg:
No. I mean when there *are* pending events. When the consumer wishes
to enable the upcall policy, he believes that he dequeued all the
events. If this is not the case - dat_evd_modify_upcall alarms him
and he knows he should continue to dequeue.

> This is to solve a race where the consumer dequeued all the events 
> and changed the policy to enable, but there were other event/s that 
> came just before calling dat_evd_modufy_upcall. In this case 
> dat_evd_modufy_upcall to enable would fail and the consumer would 
> keep dequeue-ing the events, without loosing his context.

You've only decrease the window in which that scenario could occur, 
not eliminated it. If a DTO completion occured after you count the 
number of pending events but before you enable the CQ callback, a 
completion will be missed. 

gg:
I don't think so. That is what the spin_lock_irqsave is for.

Also, the pending_event_queue is only used for kDAPL generated 
software events. This queue can be empty when there are events on the 
CQ, so your would need to be expanded your check to cover that.

gg:
I agree.

> > +		if (evd->evd_flags & DAT_EVD_DTO_FLAG) {
> > +			status = ib_req_notify_cq(evd->cq, IB_CQ_NEXT_COMP);
> 
> Why do we need to re-enable the CQ upcall?
> 
> gg:
> If the consumer returned from the evd_upcall with upcall policy 
> "disabled" the CQ upcall is not enabled. So this is the only 
> place it is done.

Ok, that fits with your new approach to the problem.