[openib-general] [RFC/BUG] DMA vs. CQ race
akepner at sgi.com
akepner at sgi.com
Thu Feb 22 10:32:08 PST 2007
On Thu, Feb 22, 2007 at 10:34:16AM -0800, Roland Dreier wrote:
>
> I actually have a vague plan for a somewhat cleaner way to get this
> fix. For a variety of reasons, I am planning on changing the way the
> kernel handles memory registration so that low-level drivers have more
> control over what happens. This would allow us to folow Gleb's
> suggestion to use register MR to create and map the kernel's buffer
> and avoid some of the error path ugliness. So I would prefer to map
> the coherent memory that way.
OK, I look forward to seeing what you have in mind.
>
> However this will take a while to come to fruition, since it is kind
> of a background task for me. How severe is this issue? In other
> words, when you produced the problem, was it a synthetic test, or a
> workload that someone might actually want to run?
>
We found this accidentally, running a normal MPI job, on a
"normally sized" machine (i.e., tens, not hundreds of
processors.) It appears to be more easily produced that
we'd expected, and we consider it to be a severe problem.
--
Arthur
More information about the general
mailing list