[openib-general] [PATCH] RDMA/iwcm: Get rid of extra call to list_empty()

Krishna Kumar2 krkumar2 at in.ibm.com
Thu Nov 16 19:49:34 PST 2006


Hi Tom,

OK, I will try it and report what I find. Thanks!

- KK

Tom Tucker <tom at opengridcomputing.com> wrote on 11/16/2006 10:08:03 PM:

> The race is that you've deleted the work queue element that is enqueued 
on
> the iwcm_wq. It's as simple as that. To prove it to yourself, apply your
> patch. Turn on memory debug support in the kernel and recompile your 
code.
> Then run rdma_krping clients in four different threads against your 
server
> with an I/O count of 1.
> 
> You'll hit the race and can look at it yourself. I don't know any better 
way
> to explain it... Sorry.
> 
> 
> On 11/13/06 10:44 PM, "Krishna Kumar2" <krkumar2 at in.ibm.com> wrote:
> 
> > Hi Tom,
> > 
> >> No, to understand why go look at the implementation of queue_work. 
BTW,
> > this
> > 
> > I was describing the implementation of queue_work() in my
> > previous mail. So sorry to be dense, but I do not understand
> > why this patch introduces a race. Can you explain the race
> > that you had found ? What I understood of queue_work() is :
> > 
> > If cm_work_handler() is already running and processing the
> > last entry at the same time this new entry was added, it is
> > guaranteed to find this new entry in it's current run iteration,
> > and process it. The only issue is with the extra queue_work
> > by iwcm parallely on a different cpu for the same case.
> > 
> > So if iwcm had done a redundant "queue_work" on this queue,
> > which, besides adding the new entry to the workqueue, also
> > does a wakeup of "worker_thread" (which is still running the
> > previous iteration of run_workqueue -> cm_work_handler).
> > I am assuming that the wake up function is
> > default_wake_function(), since I couldn't locate in wait* code
> > where this is initialized.
> > 
> > When cm_work_handler finishes removing this new entry, it
> > returns to worker_thread, which will do a schedule() and
> > sleep till it is woken up again (since default_wake_function
> > found that the thread is already running and had done
> > nothing). Are you referring to a race where the queue_work
> > is done between the time cm_work_handler finished running
> > and before it gets back to schedule ? I feel that should not
> > matter as the run_workqueue() will find this entry in it's
> > cwq->worklist and continue processing instead of exiting to
> > worker_thread() and schedule().
> > 
> > Still confused about the race :)
> > 
> > Thanks,
> > 
> > - KK
> 
> 





More information about the general mailing list