[Openib-windows] Windows DMA model

Fab Tillier ftillier at silverstorm.com
Wed Oct 19 00:14:50 PDT 2005


> From: Jan Bottorff [mailto:jbottorff at xsigo.com]
> Sent: Tuesday, October 18, 2005 10:24 PM
> 
> Why oh why does my ctrl-v sometimes send my half written email
> message... anyway...

Thanks for resending, and also for bringing up these issues.  I really
appreciate the feedback.

> I asked Microsoft about just calling MmGetPhysicalAddress for DMA and
> they responded:
> 
> =======================
> To summarize, your drivers will break on chipsets that need extra cache
> coherence help and on virtualized systems where there is an I/O MMU.
> Neither of these is particularly common today, but they'll be much more
> common in the near future.  The drivers will also break on non-x86
> machines where DMA address don't equal CPU-relative physical address,
> but those machines have become very uncommon in the last five years.

I wholeheartedly agree with this assessment.  Note that support for
virtualization will require a whole lot of work - to support kernel bypass in a
virtual machine, where the application in user-mode in the virtual machine has
to bypass both the virtual machine kernel as well as the host's kernel.  It
would be great to figure out how to do this in Windows.  I currently don't
really have a clue, though.

I look forward to any input you might have as we try to find solutions to these
current deficiencies.

> >I don't know if you saw my RFC emails about that API or not.
> 
> I didn't see that, as I'm a very new member of the list. It sounds like
> your saying the current low level interface API to things will be
> changing in the future?

Yes, the interface between the access layer and the HCA HW driver will change at
first, to be followed by the ULP to Access Layer interface.  I'll be getting
back to that soon I hope, and will be sending out headers for comments.

> >Also, driver verifier will completely break DMA for user-mode as it
> >forces double buffering to check that DMA mappings are used properly.
> 
> If you turn on DMA verification in driver verifier I believe it will
> double buffer ALL correctly done DMA, to help find memory boundary
> violations. This is also a check of the hardware.

That's correct.  Kernel clients should definitely do all DMA operations by the
book.  The question is whether registrations (both user and kernel) should use
the DMA mapping functionality, or just use the physical addresses from the MDL
after it has been locked down.  The former will result in verifier breaking
anything that uses registered memory, and the latter will result in broken DMA
due to the assumption that CPU and bus addresses are consistent and cache
coherent.  I have doubts that kernel bypass could even work without cache
coherency, though.

For example, for internal ring buffers like those used for CQE and WQE rings,
performing proper DMA mappings will break the hardware if verifier remaps these.
I suppose a way around that is to allocate those buffers one page at a time with
AllocateCommonBuffer, build up an MDL with the underlying CPU physical pages
using MmGetPhysicalAddress on the returned virtual address, remap it to a
contiguous virtual memory region using MmMapLockedPagesSpecifyCache, and then
use the bus physical addresses originally returned by AllocateCommonBuffer to
program the HCA.  I don't know if this sequence would work properly, and it
still doesn't solve the issue of an application registering its buffers.

> Passing tests with driver verifier active will be required to obtain any
> kind of WHQL driver certification. Commercial users will absolutely need
> these certifications.

Agreed, but any WHQL certifications require Microsoft to define a WHQL
certification process for InfiniBand devices.  That said, even without an
official IB WHQL program, the WHQL tests are a valuable test tool, as is
verifier.  Once Microsoft has a program for IB, I expect they'll have thought of
how to handle kernel bypass and DMA mappings for memory registrations.

> >There is some work that needs to happen to get MAD traffic to do proper
> >DMA mappings, but upper level protocols already do the right thing.
> 
> I can understand how higher level packet traffic from a kernel mode NDIS
> based driver or buffers from a STORPORT based storage driver can have
> the correct mapping already. It also sounds like other kernel drivers
> that use the IBAL interface currently aren't assured DMA is correct.

It's up to the client to do DMA mappings since the client posts work requests to
their queue pairs.  The problem is that it's not clear for which device to get a
DMA_ADAPTER and the new interface will make that much clearer.

The only part where IBAL is deficient with respect to DMA mappings is for MADs.
Anything else is the responsibility of the client.  There's no clean way to make
IBAL know exactly how to perform DMA mappings for all users automatically.

> >For the time being, since we're running on platforms where the CPU and
> >bus addresses are consistent, it hasn't been an issue.
> 
> Microsoft seems to say it's more than just an address mapping issue;
> it's also a cache coherency issue. I'm not surprised that it's desirable
> to get software to help with cache coherency as the number of processor
> cores grows, especially on AMD processor systems with essentially a NUMA
> architecture.

Aren't the AMD processors cache coherent, even in their NUMA architecture?
 
How do you solve cache coherency issues without getting rid of kernel bypass?
Making calls to the kernel to flush the CPU or DMA controller buffers for every
user-mode I/O is going to take away the benefits of doing kernel bypass in the
first place.  That's not to say we won't come to this conclusion, I'm just
throwing the questions out there.  I'm not expecting you to have the answers -
they're just questions that I don't know how to answer, and I appreciate the
discussion.

- Fab




More information about the ofw mailing list