[openib-general][PATCH][kdapl]: evd upcall policy implementation

James Lentini jlentini at netapp.com
Mon Aug 15 07:48:38 PDT 2005


Hi Guy,

Answers below:

On Sat, 13 Aug 2005, Guy German wrote:

> Hi James,
> 
> Im writing you back, from my web mail, so sorry again
> for the format (I will use "gg:" prefix again for my
> answer :)
> 
> 
> On Fri, 12 Aug 2005, Guy German wrote:
> > > -	/* Only process events if there is an enabled callback function. */
> > > -	if ((evd->upcall.upcall_func == (DAT_UPCALL_FUNC) NULL) ||
> > > -	    (evd->upcall_policy == DAT_UPCALL_DISABLE)) {
> > > +	
> > > +	/* The function is not re-entrant (change when implementing DAT_UPCALL_MANY)*/
> > 
> > Why is this function not re-entrant? For reference, here is how I 
> > would define re-entrant:
> > 
> > http://en.wikipedia.org/wiki/Reentrant
> > http://foldoc.doc.ic.ac.uk/foldoc/foldoc.cgi?re-entrant
> > 
> > gg:
> > The function can not be entered twice at the same time, because 
> > when the upcall is at the hands of the consumer he can disable 
> > the upcall policy and if it is entered twice, there is a chance 
> > the consumer will get another upcall after disabling 
> > the upcall policy.
> 
> I don't see the flow of control that would result in the scenario you 
> describe. This piece of code
> 
> 	spin_lock_irqsave (&evd->common.lock, flags);
>         if (evd->is_triggered) {
>                 spin_unlock_irqrestore (&evd->common.lock, flags);
>                 return;
>         }
>         evd->is_triggered = 1;
>         spin_unlock_irqrestore (&evd->common.lock, flags);
> 
> ensures that only one thread can be making upcalls at a time. 
> 
> gg: 
> The change I did in the function assures that the function would not 
> be entered twice at the same time, from 2 (or more) contexts 
> (by adding the spin_lock).
> The remark mentions that this change, however, does not support 
> DAT_UPCALL_MANY, which, by my understanding, require upcalls 
> to be called simultaneously, hence the function to be re-entrant 
> (and not protected by a spin lock that prevents entering the function)

How about changing the comment to say that DAT_UPCALL_MANY is not 
supported and leave it at that?

> > > +	if (evd->is_triggered)
> > >  		return;
> > > -	}
> > 
> > Why check the value here? Is it only for the efficiency of not taking 
> > the spin lock when is_triggered is 1?
> > 
> > gg:
> > No. you can't take the spin_lock here because this can cause 
> > a dead lockin the case the function calls itself from 
> > dat_evd_dequeue, on a uni-proccessor machines.
> 
> Can you elaborate on this? Do you mean that the thread that performs 
> the upcall in dapl_evd_upcall_trigger() can be used by the consumer to 
> call dapl_evd_dequeue()? If so, I don't see the flow of control that 
> begins in dapl_evd_dequeue() and reaches dapl_evd_upcall_trigger().
> 
> gg:
> The protection there is from the case, where dapl_evd_upcall_trigger
> calls dapl_evd_dequeue that calls dapl_evd_upcall_trigger again
> (recursively). This happened when there was a bad DTO completion and
> CONN_EVENT_BROKEN was synthesized. Now, I saw that this part was 
> removed from dapl_evd_wc_to_event.
> I still think that we should leave this protection, for those kinds of cases, 
> that might be implemented in the future.
> 
> That leaves me with a question: what did happened to 
> DAT_CONNECTION_EVENT_BROKEN? 

I'm not sure. 

> > > @@ -820,24 +831,19 @@ static void dapl_evd_dto_callback(struct
> > >  	 * This function does not dequeue from the CQ; only the consumer
> > >  	 * can do that. Instead, it wakes up waiters if any exist.
> > >  	 * It rearms the completion only if completions should always occur
> > > -	 * (specifically if a CNO is associated with the EVD and the
> > > -	 * EVD is enabled).
> > >  	 */
> > > -
> > > -	if (state == DAPL_EVD_STATE_OPEN && 
> > > -	    evd->upcall_policy != DAT_UPCALL_DISABLE) {
> > > -		/*
> > > -		 * Re-enable callback, *then* trigger.
> > > -		 * This guarantees we won't miss any events.
> > > -		 */
> > > -		status = ib_req_notify_cq(evd->cq, IB_CQ_NEXT_COMP);
> > > -		if (0 != status) 
> > > -			(void)dapl_evd_post_async_error_event(
> > > -				evd->common.owner_ia->async_error_evd,
> > > -				DAT_ASYNC_ERROR_PROVIDER_INTERNAL_ERROR,
> > > -				evd->common.owner_ia);
> > > -
> > > +	
> > > +	if (state == DAPL_EVD_STATE_OPEN) {
> > >  		dapl_evd_upcall_trigger(evd);
> > > +		if ((evd->upcall_policy != DAT_UPCALL_TEARDOWN) &&
> > > +		    (evd->upcall_policy != DAT_UPCALL_DISABLE)) {
> > > +			status = ib_req_notify_cq(evd->cq, IB_CQ_NEXT_COMP);
> > > +			if (0 != status) 
> > > +				(void)dapl_evd_post_async_error_event(
> > > +					evd->common.owner_ia->async_error_evd,
> > > +					DAT_ASYNC_ERROR_PROVIDER_INTERNAL_ERROR,
> > > +					evd->common.owner_ia);
> > > +		}
> > 
> > 
> > You changed the order in which the CQ upcall is enabled and the kDAPL 
> > upcall is made. It used to be:
> > 
> >  enable CQ upcall
> >  call kDAPL upcall
> > 
> > you are proposing
> > 
> >  call kDAPL upcall
> >  enable CQ upcall
> > 
> > I think your proposed order contains a race condition. Specifically if 
> > a work completion occurs after dapl_evd_upcall_trigger() 
> > returns but before the CQ upcall is re-enabled with 
> > ib_req_notify_cq(), no upcall will occur for the completion.
> > 
> > Do you agree?
> > 
> > gg:
> > You need to enable the CQ upcall only if the consumer did
> > not change his upcall policy, while in upcall context. In 
> > the first case you will create a situation where the cq is 
> > enabled, while the consumers doesn't want any upcalls.
> 
> Correct, but this can be hidden so that the consumer does not receive 
> the upcall.
> 
> gg:
> So what does dapl would do with the DTO event once it got it ?

kDAPL doesn't dequeue the CQ events, so this shouldn't be an issue.

> > In most real world application dapl_evd_upcall_trigger()
> > will return with upcall policy disabled and there will be 
> > no need to alarm the cq upcall - i.e the consumer would
> > dequeue the rest of the events himself.
> > 
> > I see the race you talk about. It is relevent to kdapltest.
> > Maybe we can check if there are pending events after 
> > enabling CQ upcall, and if there are - call 
> > dapl_evd_upcall_trigger() again. What do you think ?
> 
> The problem we are dealing with is that DAPL upcalls behave 
> differently from IB upcalls. DAPL upcalls are enabled until they are 
> disabled while IB upcalls are "one shots".
> 
> The approach taken in the current implementation is to always enable 
> the IB upcalls and determine in the DAPL provider if the consumer's 
> upcall should be invoked. 
> 
> gg:
> This is not good enough for some consumers (e.g. ISER), and it is
> not implementing the DAPL spec, which deals with upcall policies.

I disagree that this is in violation of the DAPL spec. The spec. 
describes DAPL upcall policies and leaves the interaction with lower 
layers (IB verbs) unspecified.

In any event, we should fix any of the performance problems you've 
found. 

> You are proposing a shift away from that approach. If we do that, we 
> need to preserve the original semantics. 
> 
> gg:
> I think that the implementation proposed preserves the current 
> semantics. If the consumer doesn't change the upcall policy - it
> stays enabled and everything stays the same (kdapltest is still
> working). I think you had a good point about the race over there,
> and that can be fixed.
> 
> > I'll ask the DAT Collaborative from some clarification on 
> > the meaning of the different upcall policy flags.
> 
> > One final item. To be consistent with your design, CQ upcalls should 
> > be selectively enabled in dapl_evd_internal_create().
> 
> gg:
> Why ? the initial state is that the upcalls are enabled, like it
> is today. Only if the consumer chooses to disable the upcall, he 
> calls dat_evd_modify_upcall.

Currently the IB upcall is initially enabled, but there are checks on 
the upcall path to determine if the EVD upcall is enabled. See 
dapl_evd.c line 827:

        if (state == DAPL_EVD_STATE_OPEN && 
            evd->upcall_policy != DAT_UPCALL_DISABLE) {

which you've replaced with

        if (state == DAPL_EVD_STATE_OPEN) {

> > >  	evd = (struct dapl_evd *)evd_handle;
> > > +	dapl_dbg_log (DAPL_DBG_TYPE_API, "%s: (evd=%p) set to %d\n",
> > > +			__func__, evd_handle, upcall_policy);
> > 
> > The idea was to make the DAPL_DBG_TYPE_API prints look like a 
> > debugger stack trace. The following would be keeping with the other 
> > print statements:
> > 
> > gg:
> > I thought it would make it a bit more user friendly :) sometimes
> > the consumers use those debug prints and they don't want to dwell 
> > in the kdapl code too much in order to understand what they 
> > are reading ...
> 
> You need a fair amount of familiarity with the code to know what a 
> message that says "dapl_evd_modify_upcall: (evd=dbbe4e58) set to 2" 
> means.
> 
> If you'd like to add the parameter names (e.g. evd=%p, 
> upcall_policy=...) that is fine. 
> 
> gg:
> Sure. That’s good enough.
> 
> > I think the function call format is  better, 
> > because a user familiar with the function 
> > signatures will know what each of the fields means. 
> 
> gg:
> I think that if someone is writing a code over dapl, for the
> first time, he has enough on his head, beside reviewing dapl’s 
> implementation code.
> It is easier to debug your own code with helpful debug prints, from
> the lower layer, like: "upcall_policy=1" / "upcall_policy=o" / 
> "connecting to 192.168.10.10"  DTO completion info ... etc..
> 
> > >  
> > > +	spin_lock_irqsave(&evd->common.lock, flags);
> > > +	if ((upcall_policy != DAT_UPCALL_TEARDOWN) &&
> > > +	    (upcall_policy != DAT_UPCALL_DISABLE)) {
> > 
> > Why not let the consumer setup the upcall when it disabled? That seems 
> > like the only safe time to modify it.
> > 
> > gg:
> > The consumer needs and can change the poilcy to disable and enable.
> > The only time he is not allowed to change the policy to enable (in 
> > this implementation) is when there are still pending events in the 
> > queue. 
> 
> You mean when there are *no* pending events in the queue.
> 
> gg:
> No. I mean when there *are* pending events. When the consumer wishes
> to enable the upcall policy, he believes that he dequeued all the
> events. If this is not the case - dat_evd_modify_upcall alarms him
> and he knows he should continue to dequeue.

Agreed. I didn't read your original sentence carefully enough.

> > This is to solve a race where the consumer dequeued all the events 
> > and changed the policy to enable, but there were other event/s that 
> > came just before calling dat_evd_modufy_upcall. In this case 
> > dat_evd_modufy_upcall to enable would fail and the consumer would 
> > keep dequeue-ing the events, without loosing his context.
> 
> You've only decrease the window in which that scenario could occur, 
> not eliminated it. If a DTO completion occured after you count the 
> number of pending events but before you enable the CQ callback, a 
> completion will be missed. 
> 
> gg:
> I don't think so. That is what the spin_lock_irqsave is for.

What if there are multiple processors in the system?

> Also, the pending_event_queue is only used for kDAPL generated 
> software events. This queue can be empty when there are events on the 
> CQ, so your would need to be expanded your check to cover that.
> 
> gg:
> I agree.
> 
> > > +		if (evd->evd_flags & DAT_EVD_DTO_FLAG) {
> > > +			status = ib_req_notify_cq(evd->cq, IB_CQ_NEXT_COMP);
> > 
> > Why do we need to re-enable the CQ upcall?
> > 
> > gg:
> > If the consumer returned from the evd_upcall with upcall policy 
> > "disabled" the CQ upcall is not enabled. So this is the only 
> > place it is done.
> 
> Ok, that fits with your new approach to the problem.
> 
> 


More information about the general mailing list