[ofa-general] Re: InfiniBand card (mthca) in Linux

Mon Jul 9 08:28:55 PDT 2007

 > > according to Xen-dev alloc_pages does *not* guarantee contiguous
 > > pages. They say that the pci_alloc_consistent should be used
 > > instead. The question is whether non-Xen kernel *usually* allocates
 > > contiguous pages and so far it has been working and whether it
 > > should be fixed in the mainline of the driver.
 > > 
 > > I do some tests (and also try to figure out how to change
 > > alloc_pages to pci_alloc_consistent) to verify contiguous pages.
 > 
 > You missed an important bit of Keir's response---it's perfectly fine
 > to use alloc_pages provided you then use the dma_map_single API, which
 > for Xen dom0 will take care of bounce-buffering to a
 > machine-contiguous buffer if necessary. I am not sure if the same
 > holds for a domU kernel.

I guess there was a mail thread that I wasn't copied on (I don't read
any Xen mailing lists).

Anyway, what mthca does is the following.  It wants to give a bunch of
system memory (megabytes) to the hardware for the hardware to use for
its internal context.  The hardware accesses this memory via PCI DMA
of course.  So what mthca does is:

 - Allocate large chunks of system memory using
   alloc_pages(GFP_HIGHUSER, order) with order > 0
 - Built up an array of struct scatterlist where each entry is one of
   the order >0 pages allocated as above
 - Map that scatterlist with pci_map_sg(..., PCI_DMA_BIDIRECTIONAL)
 - Pass the DMA addresses returned from that to the hardware

As far as I can see, what mthca is doing is perfectly fine as far as
the DMA mapping API is concerned.  If Xen is returning non-contiguous
memory from alloc_pages() and then allocating bounce buffers in
pci_map_sg() then that should work (although it will be somewhat
inefficient, since the original memory will never actually be used).
However I would confirm that that is what Xen is really trying to do,
and also that the code is working as intended when the scatterlist has
entries with pages of order >0.

As a side note, mthca could use dma_alloc_coherent() to allocate this
hardware memory, but that would be inefficient on 32-bit systems,
because it would use up kernel address space for memory that will only
be touched by the hardware.  So that's why it allocates pages with
GFP_HIGHUSER instead.

 - R.