[openib-general] Getting rid of pinned memory requirement

Troy Benjegerdes hozer at hozed.org
Mon Mar 14 15:56:05 PST 2005


On Mon, Mar 14, 2005 at 03:29:06PM -0800, Caitlin Bestler wrote:
>  
> 
> > -----Original Message-----
> > From: openib-general-bounces at openib.org 
> > [mailto:openib-general-bounces at openib.org] On Behalf Of Troy 
> > Benjegerdes
> > Sent: Monday, March 14, 2005 3:01 PM
> > To: openib-general at openib.org
> > Subject: [openib-general] Getting rid of pinned memory requirement
> > 
> > The current InfiniBand model of using 'mlock()' to maintain a 
> > constant virtual to physical mapping for registered memory 
> > pages is not going to work with NUMA page migration and 
> > memory hotplug.
> > 
> > I want to get some discussion started on this list, and once 
> > we have an idea what's feasable from the infiniband side, to 
> > bring up the discussion on linux-kernel, and get the memory 
> > hotplug and numa page migration people involved as well.
> > 
> > I think the following list covers the major points. Are there 
> > any big "gotcha's" involved? 
> > 
> > * Add "registered" flag to linux/mm.h (VM_REGISTERED 0x01000000)
> > 
> > * Need to define a 'registered memory' api. Maybe using 'madvise()' ?
> > 
> > * Kernel needs to be able to unpin registered memory and 
> > shoot down cached
> >   mappings in network cards (treat IB/Iwarp cards like a TLB)
> > 
> > * Requires IB/Iwarp card to dispatch an interrupt on a mapping 'miss'
> > 
> 
> The point of requiring that the memory be pinned is so that
> the IB/iWARP card does not have to deal with the kernel on
> a per-placement basis.
> 
> That includes having to double-check any host memory resources
> to see if there is anything to 'miss' in the mapping.

I guess I wasn't implying any 'double-checking'.. What I want is for the
kernel to be able to unpin memory and tell the card it did so, instead
of being locked into never being able to move that memory around. This
requires no host memory interaction.

By doing this, I can register a whole lot *more* memory, and the kernel
can still keep buggy applications from trashing the whole system.

[snip]

> Fundamental to any definition of RDMA is that the application
> controls the avialability of target memory -- not the kernel.
> That is why traditional buffer flow controls do not apply.

While hardware designers may like this idea, I would like to make the
point that if you want the application to *absolutely* control the
availability of physical memory, you shouldn't be writing userspace
applications that run on Linux.

There's always going to be a limit on how much memory you can mlock. And
right now the only option the kernel has for unlocking that memory is to
kill the application. I think there's got to be a reasonable way to deal
with this that doesn't make the application responsible for everything
in the world. We don't want to have to rewrite every RDMA application to
be able to support memory hotplug. This is an obvious layer that can and
should be abstracted by the kernel.



More information about the general mailing list