[openib-general][PATCH][kdapl]: FMR and EVD patch
Guy German
guyg at voltaire.com
Thu Aug 18 13:46:31 PDT 2005
Hi James,
I will try to explain the reason behind this patch:
In IB, a normal working flow, for a consumer, is:
- Receive a CQ notification callback
- Wakeup polling thread
- Poll for completion (empty the queue)
- Request completion notification
There is no problem here.
In kdapl, however, the consumer will keep getting upcalls, until he sets the upcall policy to disable. So a working flow will be:
- Receive an evd upcall
- Disable evd upcall policy
- Wakeup polling thread
- Dequeue all evds
- Enable evd upcall policy
There is a race here:
A completion can come after the last dequeu and before the Enabling. The provider wont call for the consumer (policy is disabled) and the consumer would not dequeu any more because he knows the queue is empty.
I think it is a very bad idea, to solve this race by adding another evd_dequeue after you enable the upcall policy.
If you do that you would have a polling thread (because while you dequeue one completion you can have many more following) and at the same time you will receive upcall from the dapl provider.
Beside the fact that this is an expensive and unnecessary context switch you have an upcall and a thread racing.
You will have a situation that the upcall has an event at hand and the thread has an event, both not handled yet - you will have to queue them again internally or something to keep the order. And I think that is only a partial list of the problems in this case.
SO
My suggestion is simple, it solves the race, it saves the unnecessary context switch and it spares the complexity from the consumer side.
The solution is to notify the consumer when he tries to enable upcall policy, that the queue is actually not empty, and force him to continue polling (in the same thread context he is now).
dat_evd_modify_upcall is guarded by a spin_lock_irqsave, when it checks the queue and so the race would not occur.
BTW,
Im not sure if it is still the case, but I think that one of the ulps in openib, did not use a kernel thread for dequeu-ing. This is a very bad design, as the upcall can be polling for *long* periods of time, in a tasklet/interrupt context.
Thats it
Sorry for the long mail I hope It was not to blur
Guy.
-----Original Message-----
From: James Lentini [mailto:jlentini at netapp.com]
Sent: Thu 8/18/2005 10:28 PM
To: Guy German
Cc: Openib
Subject: Re: [openib-general][PATCH][kdapl]: FMR and EVD patch
Hi Guy,
The one piece of this patch that remains unaccepted is:
Index: ib/dapl_evd.c
===================================================================
--- ib/dapl_evd.c (revision 3136)
+++ ib/dapl_evd.c (working copy)
@@ -1028,6 +1028,7 @@
{
struct dapl_evd *evd;
int status = 0;
+ int pending_events;
evd = (struct dapl_evd *)evd_handle;
dapl_dbg_log (DAPL_DBG_TYPE_API, "%s: (evd=%p, upcall_policy=%d)\n",
@@ -1035,14 +1036,25 @@
spin_lock_irqsave(&evd->common.lock, evd->common.flags);
if ((upcall_policy != DAT_UPCALL_TEARDOWN) &&
- (upcall_policy != DAT_UPCALL_DISABLE) &&
- (evd->evd_flags & DAT_EVD_DTO_FLAG)) {
- status = ib_req_notify_cq(evd->cq, IB_CQ_NEXT_COMP);
- if (status) {
- printk(KERN_ERR "%s: dapls_ib_completion_notify failed "
- "(status=0x%x)\n",__func__, status);
+ (upcall_policy != DAT_UPCALL_DISABLE)) {
+ pending_events = dapl_rbuf_count(&evd->pending_event_queue);
+ if (pending_events) {
+ dapl_dbg_log(DAPL_DBG_TYPE_WARN,
+ "%s: (evd %p) there are still %d pending "
+ "events in the queue - policy stays disabled\n",
+ __func__, evd_handle, pending_events);
+ status = -EBUSY;
goto bail;
}
+ if (evd->evd_flags & DAT_EVD_DTO_FLAG) {
+ status = ib_req_notify_cq(evd->cq, IB_CQ_NEXT_COMP);
+ if (status) {
+ printk(KERN_ERR "%s: dapls_ib_completion_notify"
+ " failed (status=0x%x) \n",__func__,
+ status);
+ goto bail;
+ }
+ }
}
evd->upcall_policy = upcall_policy;
evd->upcall = *upcall;
The IB analog to this function, ib_req_notify_cq(), does not require
that the CQ be empty. The kDAPL specification does not define an empty
EVD as a requirement for modifying the upcall and previous
implementations of the API have not made this requirement.
More information about the general
mailing list