[openib-general] [RFC/BUG] DMA vs. CQ race

akepner at sgi.com akepner at sgi.com
Thu Feb 22 10:32:08 PST 2007


On Thu, Feb 22, 2007 at 10:34:16AM -0800, Roland Dreier wrote:
> 
> I actually have a vague plan for a somewhat cleaner way to get this
> fix.  For a variety of reasons, I am planning on changing the way the
> kernel handles memory registration so that low-level drivers have more
> control over what happens.  This would allow us to folow Gleb's
> suggestion to use register MR to create and map the kernel's buffer
> and avoid some of the error path ugliness.  So I would prefer to map
> the coherent memory that way.

OK, I look forward to seeing what you have in mind.

> 
> However this will take a while to come to fruition, since it is kind
> of a background task for me.  How severe is this issue?  In other
> words, when you produced the problem, was it a synthetic test, or a
> workload that someone might actually want to run?
> 

We found this accidentally, running a normal MPI job, on a 
"normally sized" machine (i.e., tens, not hundreds of 
processors.) It appears to be more easily produced that 
we'd expected, and we consider it to be a severe problem.

-- 
Arthur





More information about the general mailing list