[Openib-windows] Windows DMA model

Leonid Keller leonid at mellanox.co.il
Wed Jan 18 05:08:10 PST 2006


Some thoughts about DMA API.

I find it very uncomfortable: it imposes some way of work with
unrequested implicit actions instead of providing a minimal number of
tools to do the things as we'd like it.
I believe, Infiniband is a *typical* solution in the sense, that any
interconnect, providing large bandwidth and low latency will need to
support scatter/gather and 64-bit addressing.
Therefore is worth to require additional/another API, that will
eliminate these deficiences both with enabling (as far as possible)
kernel bypass. 

I believe it would enough only 5 functions (let's call them NewDmaApi):

   IoGetDmaAdapterEx(*Adapter, *Flags, *NmapRegisters,...)
      Returns Adapter object and capability Flags: IsDoubleBuffering,
IsNewDmaApi, IsDmaCacheble.	

   IoMapBufToDmaSpace(Adapter, Mode, PcuNonCachable, DmaNonCachable,
addr, size, ...);
      Locks the memory, allocates mapping registers and makes them
uncachable, if requested.
      Returns an array of logical addresses and an opaque DmaBuffer
object.
	Pay attention:
         No need for memory allocation: there are malloc and
ExAllocatePool(Paged).
         No need for mapping to kernel space: there is
MmProbeAndLockPages.
         An error NOT_ENOUGH_REGISTERS is to be handled as a failure on
memory allocation.

   IoUnmapBufFromDmaSpace(DmaBuffer)
	Undoes the previous.

   KeFlushPcuBuffers(DmaBuffer)
      Flushes CPU cache to the memory buffer.
      Used only once, if the buffer was mapped as PcuNonCachable.

   KeFlushDmaBuffers(DmaBuffer)
      Flushes DMA cache to the memory buffer.
      Not used at all, if the buffer was successfully mapped as
DmaNonCachable.

With such API and cachable mapping registers we could work both in
kernel and user without changing our model. It will also work in a guest
OS.

Am I missing something ?
	
> -----Original Message-----
> From: Fab Tillier [mailto:ftillier at silverstorm.com] 
> Sent: Tuesday, January 17, 2006 7:24 PM
> To: Leonid Keller; Jan Bottorff; openib-windows at openib.org
> Subject: RE: [Openib-windows] Windows DMA model
> 
> > From: Leonid Keller [mailto:leonid at mellanox.co.il]
> > Sent: Tuesday, January 17, 2006 3:58 AM
> > 
> > First, to inform: I've found the problem with my kernel DMA tesing.
> > It seems like a bug in Microsoft: when you ask map 
> registers for a 2GB 
> > transfer - which is the max for our cards - IoGetDmaAdapter 
> returns 1 
> > register. For *any* number X, less than that, they return an 
> > appropriate number.
> 
> Did you check to see what happens if you make a request 
> larger than 2GB?  Is 2GB the threshold?
> 
> In any case, I'm glad to hear this is working now.
> 
> > It seems like gives us opportunity to solve all the problems for 
> > kernel work.
> 
> Yes, it should.  However, kernel clients will need to do the 
> DMA mappings before and unmap after each I/O so that things 
> work properly.  Clients doing only local operations (not 
> exposing buffers for RDMA) can create a DMA MR (for all of 
> physical memory), and then only need to perform the DMA 
> mappings during I/O.
> 
> Any ring buffers shared between the HCA driver and hardware 
> will need to be allocated via AllocateCommonBuffer.
> 
> > As for the userland, I also don't see a solution, but giving up 
> > os-bypassing and performing memory registration during send/recv.
> 
> For user-land, I'm thinking more and more that having the 
> kernel allocate common buffers for the ring buffers (in CQs 
> and QPs) is the way to go.  I don't know how well this will 
> scale, but it will guarantee that the ring buffers at least 
> are working properly.
> 
> For I/O, it would all depend on whether the platform performs 
> double buffering or not.  However, there's no way to tell 
> what the platform is doing under the DMA APIs, so you're 
> right - for guaranteed correct operation, the app would have 
> to register/deregister for each send/recv.  However, this 
> goes against the design of WSD, so I think they have some 
> explaining to do.  I'll follow up with them and see if I can 
> get any additional information.
> 
> - Fab
> 
> 



More information about the ofw mailing list