[openib-general] [PATCH] RDMA/iwcm: Get rid of extra call to list_empty()

Tom Tucker tom at opengridcomputing.com
Thu Nov 16 08:38:03 PST 2006


The race is that you've deleted the work queue element that is enqueued on
the iwcm_wq. It's as simple as that. To prove it to yourself, apply your
patch. Turn on memory debug support in the kernel and recompile your code.
Then run rdma_krping clients in four different threads against your server
with an I/O count of 1.

You'll hit the race and can look at it yourself. I don't know any better way
to explain it... Sorry.


On 11/13/06 10:44 PM, "Krishna Kumar2" <krkumar2 at in.ibm.com> wrote:

> Hi Tom,
> 
>> No, to understand why go look at the implementation of queue_work. BTW,
> this
> 
> I was describing the implementation of queue_work() in my
> previous mail. So sorry to be dense, but I do not understand
> why this patch introduces a race. Can you explain the race
> that you had found ? What I understood of queue_work() is :
> 
> If cm_work_handler() is already running and processing the
> last entry at the same time this new entry was added, it is
> guaranteed to find this new entry in it's current run iteration,
> and process it. The only issue is with the extra queue_work
> by iwcm parallely on a different cpu for the same case.
> 
> So if iwcm had done a redundant "queue_work" on this queue,
> which, besides adding the new entry to the workqueue, also
> does a wakeup of "worker_thread" (which is still running the
> previous iteration of run_workqueue -> cm_work_handler).
> I am assuming that the wake up function is
> default_wake_function(), since I couldn't locate in wait* code
> where this is initialized.
> 
> When cm_work_handler finishes removing this new entry, it
> returns to worker_thread, which will do a schedule() and
> sleep till it is woken up again (since default_wake_function
> found that the thread is already running and had done
> nothing). Are you referring to a race where the queue_work
> is done between the time cm_work_handler finished running
> and before it gets back to schedule ? I feel that should not
> matter as the run_workqueue() will find this entry in it's
> cwq->worklist and continue processing instead of exiting to
> worker_thread() and schedule().
> 
> Still confused about the race :)
> 
> Thanks,
> 
> - KK






More information about the general mailing list