[openib-general][PATCH][kdapl]: FMR and EVD patch

Tue Aug 30 07:40:24 PDT 2005

I'm not comfortable with a solution that relies on vendor specific 
behavior for such a critical mechanism. 

Given that the OpenIB ib_req_notify_cq() verb conforms to the IBTA 
spec's semantics

http://www.mail-archive.com/openib-general%40openib.org/msg08935.html

and that the development of three other low-level OpenIB drivers 
(Pathscale, IBM, Ammasso) have been announced, I believe relying on 
this behavior would be a mistake.

What if dapl_evd_modify_upcall() worked as follows 

dapl_evd_modify_upcall
	lock evd with spin_lock_irqsave
	if CQ upcalls need to be enabled
		ib_req_notify_cq
	setup the evd upcall 
	unlock evd with spin_unlock_irqrestore
	if ib_peek_cq reports unreaped work completions
		call dapl_evd_dto_callback

I realize that the call to dapl_evd_dto_callback() will potentially be 
racing with a CQ upcall, but I believe that the logic in 
dapl_evd_dto_callback() handles that correctly.

james

On Mon, 29 Aug 2005, Guy German wrote:

> James Lentini wrote:
> > I agree with you on the problems poised by the current interface. I 
> > hope we can find a solution that fixes the problem. 
> > Note that the same problem must be handled by a ULP using the native verbs.
> 
> I don't think we have the same problem in the verbs.
> In the currently Mellanox hw (which is AFAIK the only available hw in openib) 
> there is no race at all (because of the proprietary, more “considerate”, 
> completion notification implementation). 
> 
>  - Receive a CQ notification callback
>  - Wakeup polling thread
>  - Poll for completion (empty the queue) 
>  - Request completion notification
> [you will get a completion notification even for “old” completions on the queue]
> - exit thread
> 
> In the case of other, more harsh ib compliant future hw implementation – 
> Request completion Notification “extended verb” could encapsulate:
> - request CQ notification
> - if cq !empty request CQ notification _again_
> (note that you are not *polling* the cq – just checking the queue.
> This is different then draining the evd "one more time")
> 
> And the race is solved.
> Indeed, it is not as efficient as sparing the context switch 
> (to interrupt and back to thread) altogether. 
> 
> >I still think that there may be a race condition with this patch. 
> >Here's the scenario I'm concerned about:
> > -     Receive an evd upcall
> > -     Disable evd upcall policy
> > -     Wakeup polling thread
> > -     Dequeue all events
> > -     Enable evd upcall policy by:
> >       1. Call dapl_evd_modify_upcall() to enable the evd upcall
> >       2. Obtain the EVD spin lock via spin_lock_irqsave, thus 
> >          disabling local interrupts
> >       3. Check that the EVD's ring buffer is empty (there are no DAPL 
> >          "software" events)
> >       4. A DTO completion occurs on the EVD's CQ
> >       5. Enable the CQ's upcall via ib_req_notify_cq()
> >
> >If I understand you correctly, you are asserting that event #4, the 
> >CQ's DTO completion, cannot occur because the local interrupts are 
> >disabled by spin_lock_irqsave(). Have I understood you correctly?
> 
> Not quite. The *consumer’s upcall* would not be called, due to the irq disable. 
> The race would not occur, OTOH, because the Mellanox hw will initiate a 
> completion notification even if the completions in the cq arrived before 
> the notification request.
> If you want to be more ib compliant, for future possible implementations, 
> you can apply the “extended-notify-routine” (as mentioned above).
> 
> > My belief is that the completion will occur on the card 
> > regardless of the interrupt state. 
> 
> True, but the consumer will be notified only as soon as the irq 
> is enabled again
> 
> > Can you provide me with a reference that guarantees this 
> > will not happen?
> 
> I’m not saying that it won’t  ;) but I don't think there will be a race...
> 
> Guy
>