[Openib-windows] Windows DMA model

Thu Jan 19 10:21:30 PST 2006

> From: Leonid Keller [mailto:leonid at mellanox.co.il]
> Sent: Thursday, January 19, 2006 12:57 AM
> 
> > -----Original Message-----
> > From: Fab Tillier [mailto:ftillier at silverstorm.com]
> > Sent: Wednesday, January 18, 2006 7:57 PM
> >
> > DMA support isn't broken at all for clients that perform
> > mappings on a per-request basis, which is adequate for most
> > existing hardware.  The existing DMA APIs support
> > scatter/gather and 64-bit addressing just fine.  The problem
> > with IB is that of memory registration, which is unique to
> > RDMA hardware.  The DMA APIs where not designed to allow
> > arbitrary buffers to be converted to common buffers (which is
> > really what we're looking for here).
> 
> I didn't say, it's broken, it's just uncomfortable.
> They make one to build their sg lists, which one doesn't need, they make
> one to release mapping registers after every transfer, which is not
> effective, they do not allow one to start his transfer immediately, but
> start it themselves at their disposion ...
> Yes, I understand, that they wanted to fairly dispense the map registers
> between the consumers, but they didn't do that with any other resource -
> memory, handles, sync primitives et al. Why the heck do that here ?

They do this to provide a common HAL interface so that drivers can be written
generically. The result is that properly written drivers need only build for the
CPU architecture, and can let the HAL take care of hardware difference beyond
that.  The API definition provides the maximum amount of flexibility.  It
doesn't have to be inefficient either, as the callback function could be invoked
from the context of the caller.

> > > I believe it would enough only 5 functions (let's call them
> > > NewDmaApi):
> > >
> > > With such API and cachable mapping registers we could work both in
> > > kernel and user without changing our model. It will also work in a
> > > guest OS.
> >
> > I don't think we need that many APIs - two should suffice:
> 
> Really, I've also suggested only two new APIs, the same as you.
> 2 flushing functions exist today: KeFlushIoBuffers and
> FlushAdapterBuffers, but the latter one is called implicitly upon
> PutScatterGatherList.

You've also introduced the IoGetDmaAdapterEx, which isn't necessary if you're
dealing with common buffers (which is really what you're trying to do - make a
buffer accessible by both SW and HW).

> I suggested some little thing, facilitating our work and not affecting
> others:
> 	- OUT Flags parameter to IoGetDmaAdapter to prevent guessing
> experiments on the capabilities of DMA support;
> 	- Optional PcuNonCachable flag to make user buffer uncachable
> and avoid calling KeFlushIoBuffers before every transfer;
> 	- Optional DmaNonCachable flag to turn if possible off DMA
> caching and avoid calling FlushAdapterBuffers after every transfer
> operation.
> The latter 2 options save us 2 kernel calls on every transfer operation
> of user application and therefore enabling a real kernel bypass.

You don't need to call any flushing functions when dealing with common buffers,
which is why I suggested an API that comes close to common buffer usage.  It
eliminates the need for IoGetDmaAdapterEx.  The idea behind my suggestion was to
introduce a way to make any buffer a common buffer.  The existing API is
restrictive because it allocates the memory as well as maps it.  I would venture
that internally, AllocateCommonBuffer is split into a memory allocation call and
a call to make that memory accessible for DMA.

Common buffers also don't require experiments on double buffering.  The
AllocateCommonBuffer API takes a flag to control whether the buffer is
cacheable, which should be sufficient for our purposes.

> > SCATTER_GATHER_LIST IoMapCommonBuffer( DMA_ADAPTER *pAdapter,
> > VOID *pBuf, ULONG Length, BOOLEAN CacheEnabled, HANDLE
> > *phCommonBuffer ); VOID IoUnmapCommonBuffer( HANDLE hCommonBuffer );
> >
> > The pBuf parameter to IoMakeCommonBuffer could be changed to
> > be an MDL list, which might make more sense.  The MDL would
> > be for a probed and locked buffer, or for non-paged pool.
> 
> It's bad for user buffers, because it makes you first to map the buffer
> into kernel space which is not scalable.

No, you misunderstood.  The buffer used in IoMapCommonBuffer could be allocated
through any memory allocation call.  The buffers must be probed and locked
regardless, whether you hide that internally to the function or not.  I'm just
leaving it explicit.  The IB memory registration call would internally, for
user-mode buffers, lock the pages down and then map them as a common buffer.
That's still a single kernel transition from the client's perspective.
MmProbeAndLockPages doesn't map the pages into the kernel's address space - it
just makes them resident and locks them down.  If we just pass a virtual
address, we need to add to IoMapCommonBuffer all the parameters that
MmProbeAndLockPages takes (AccessMode and Operation).  There's no reason to make
the function that complicated.

The sequence of events would be something like this:

- User calls RegMem with a malloc'd buffer
- IBAL makes kernel transition
- kernel proxy builds MDL list for user's buffer
- kernel proxy calls MmProbeAndLockPages on each MDL in the list
- kernel proxy calls HCA driver's virtual registration call with MDL list
- HCA driver calls IoMapCommonBuffer to get SGL for the registered buffer
- HCA driver uses SGL to program the HCA's translation table

The idea with the above is that by the time the call gets to the HCA driver, the
HCA driver doesn't have to care where the buffer came from or how it was
allocated.

> I suggest to use usual
> (pagable) buffers both in kernel and user and let IoMapCommonBuffer to
> lock it and map to DMA address space. Maybe IoMapCommonBuffer will
> manage to do it without building MDL or mapping to kernel.

An MDL list must be created to map buffers.  Why hide that internally to the
IoMapCommonBuffer API?  Look at GetScatterGatherList - it takes an MDL list as
input.  IoMapCommonBuffers keeps a similar look-and-feel to the existing APIs.

Remember, there's no restriction on where or how the buffers are allocated.

Adding the IoMapCommonBuffer (and IoUnmapCommonBuffer too) doesn't require any
changes to the existing DMA model, so existing drivers are unchanged.

Does that make more sense now?  Do you still see problems with my proposal?

Thanks,

- Fab