[openib-general] Reserved L_Key API (was Re: DMA mapping on sparc64)

Roland Dreier roland at topspin.com
Tue Sep 14 20:38:53 PDT 2004


Based on Tom's sparc64 testing, I'd like to design an API for
consumers (MAD layer, IPoIB, etc) who want to do local DMA to
arbitrary addresses.  Our current hack of registering all of memory by
assuming that DMA addresses will be between 0 and (high_memory -
PAGE_OFFSET) is not valid (as sparc64 shows) and probably won't be
accepted into the kernel.

For new HCAs that support the base memory management extensions, the
consumer can just use the reserved L_Key.  It is almost possible to
simulate this with Tavor: one can create a memory region that does not
perform any address translation (and just uses the address given in a
work request as a PCI bus address), but it is not possible to turn off
PD enforcement.

This means we need an API that allows a consumer to get a "no
translation" MR for a given PD.  My proposal would be as follows:

The low-level driver entry point would just be:

	struct ib_mr *(*get_dma_mr)(struct ib_pd *);

And the client-exposed entry point:

	struct ib_mr *ib_get_dma_mr(struct ib_pd *);

Only the L_Key of this MR would be valid, and it would always have
local write access (to match the semantics of reserved L_Key).  If the
HCA supports reserved L_Key, it can just return the same L_Key for
every consumer.  If need be it can take the PD into account.

It is required for the consumer to call ib_dereg_mr() on this MR when
exiting, but this can be a NOP for HCAs that support reserved L_Key.

I would argue that this entry point should replace reg_phys_mr as a
mandatory low-level driver function; this will simplify the
implementation of consumers that use the API.  Devices that can't even
simulate reserved L_Key like Tavor (and I don't know of any such
devices -- even on Topspin's embedded platforms I could implement this
API) could just register a giant address range in a normal physical MR
(and even use pci_set_dma_mask() to limit the size of the MR to 4 GB
if they're really limited).

Comments?  Better naming ideas?

Thanks,
  Roland



More information about the general mailing list