[Openib-windows] Using of fast mutexes in WinIb

Fab Tillier ftillier at silverstorm.com
Mon Nov 28 09:46:49 PST 2005


Hi Leo,

> From: Leonid Keller [mailto:leonid at mellanox.co.il]
> Sent: Sunday, November 27, 2005 3:36 AM
> 
> I've studied DMA support in Windows and come to the following
> conclusions:
> 
> 1) Control path: There is no simple way to translate virtual address
> into logical.
> 	The only function, that performs this job is
> AllocateCommonBuffer. It works on PASSIVE_LEVEL.
> 	For user space it means, that we can't allow to allocate
> persistent buffers (like CQ ring buffer) in user space, but will need to
> call into kernel, allocate the buffer there and map it back to the user
> space, which can be tricky, because requires to reside in the
> process-originator's address space.
> [this is ugly, but one can live with that. see now below ...]

AllocateCommonBuffer doesn't work for applications registering memory, so I
don't think working around it for the CQE and WQE rings is worth it.  We need to
find a way to perform long-term DMA mappings, but I don't see that happening
without some help from Microsoft (which I'm working on).

> 2) Data path: getting logical addresses imposes using of Windows DMA
> work model.
> 	It complicates data path algorithms in kernel and causes serious
> performance penalties, coming, first, from the need in translating their
> structures into ours and back, and, which is much more important, from
> severe limitations on number of map registers for one transfer
> operation. In a test i wrote, i didn't manage to get more than one map
> register from IoGetDmaAdaper, which means, that in order to send 1MB
> buffer i'll need to perform 256 data transfers with an interrupt and DPC
> processing in each one !!!

Normally, if you're doing 64-bit DMA and the PCI adapter advertises 64-bit
support, you should not need any map registers at all (which might be why it
returns a value of 1).
 
Mapping registers are only needed if the hardware can't access addresses above
the 4GB mark, in which case the OS maps the buffers to the lower address space
to make them available.  Driver Verifier complicates things in that it allocates
an intermediate buffer that it copies to/from before/after the DMA operation, so
that the hardware actually performs it's DMA to Verifier's buffer, not the
user's.  This allows Verifier to check that DMA operations are done correctly,
but totally breaks the semantics of memory registration that we need for IB.

Did you try to call GetScatterGatherList to see how much of the buffer it mapped
when it gives you the SGL?

> 	For user space we will need to call into kernel every time in
> order to use these functions, which screws the performance more badly.

Not to mention that it's not the natural programming model for RDMA, where
ideally the application registers its buffers up front independently of I/O
operations.  There may be no local I/O operation if the registered memory is the
target of an RDMA read or write, and thus no kernel transition to perform
mappings.

> Unless i'm missing here something and taking into account, that physical
> addresses are equal to logical ones (today) on all x32 and x64 machines
> and on most Itaniums, i believe, we have no other choice than to give up
> using DMA model in this (gen1) release.

I agree that we can't use the DMA model for user-mode I/O.  For kernel mode, we
should be able to have all buffers properly mapped since there shouldn't be
mapping registers involved.  We can also use AllocateCommonBuffer in the kernel
for things like the CQE and WQE rings, but I think we can delay that until we
solve the user-mode issue.

I believe that mapping registers are a separate issue from CPU/Bus address
inconsistency, and kernel ULPs should do per-I/O DMA operations ideally using
physical addresses for registrations.  IPoIB and SRP do this currently thanks to
their respective port drivers - I don't know if SDP does.  As long as we don't
enable DMA checking when using driver verifier, we should be OK.

> I suggest to put this issue on hold till gen2 while trying to prod
> Microsoft to suggest something more appropriate for OS-bypassing
> software and hardware, requiring high performance.

I agree - if nothing else, I'd like to see us come out with a release no later
than 1Q06, even if it has minimal content (more on that later).  Without some
input from Microsoft we won't get very far solving these issues.

- Fab




More information about the ofw mailing list