[openib-general] Getting rid of pinned memory requirement

Caitlin Bestler caitlinb at siliquent.com
Mon Mar 14 15:29:06 PST 2005


 

> -----Original Message-----
> From: openib-general-bounces at openib.org 
> [mailto:openib-general-bounces at openib.org] On Behalf Of Troy 
> Benjegerdes
> Sent: Monday, March 14, 2005 3:01 PM
> To: openib-general at openib.org
> Subject: [openib-general] Getting rid of pinned memory requirement
> 
> The current InfiniBand model of using 'mlock()' to maintain a 
> constant virtual to physical mapping for registered memory 
> pages is not going to work with NUMA page migration and 
> memory hotplug.
> 
> I want to get some discussion started on this list, and once 
> we have an idea what's feasable from the infiniband side, to 
> bring up the discussion on linux-kernel, and get the memory 
> hotplug and numa page migration people involved as well.
> 
> I think the following list covers the major points. Are there 
> any big "gotcha's" involved? 
> 
> * Add "registered" flag to linux/mm.h (VM_REGISTERED 0x01000000)
> 
> * Need to define a 'registered memory' api. Maybe using 'madvise()' ?
> 
> * Kernel needs to be able to unpin registered memory and 
> shoot down cached
>   mappings in network cards (treat IB/Iwarp cards like a TLB)
> 
> * Requires IB/Iwarp card to dispatch an interrupt on a mapping 'miss'
> 

The point of requiring that the memory be pinned is so that
the IB/iWARP card does not have to deal with the kernel on
a per-placement basis.

That includes having to double-check any host memory resources
to see if there is anything to 'miss' in the mapping.

Once a memory region is registered the HCA/RNIC is entitled to
assume that the mapping from LKey/Address (STag/TO) to physical
memory is not subject to change.

Enhancement protocols have been discussed in both DAPL and
RNIC-PI to allow kernels to rearrange memory, but they involve
the host explicitly telling the HCA/RNIC to suspend access to
a memory region *and* when possible taking action to quiesce
the connections using the memory region.

> * This model allows applications to register more memory than 
> physically exists, and the kernel manages what is actually pinned.
> 

Fundamental to any definition of RDMA is that the application
controls the avialability of target memory -- not the kernel.
That is why traditional buffer flow controls do not apply.

> * Requires adding hooks in MM code to dispatch driver mapping 
> shootdowns. (A
>   per-VM area list of adapters to be notified for the mapping?) 
> 
> 
> I know that having the card dispatch an interrupt on an 
> incoming packet that's not mapped is outside the spec. The 
> alternative is that if the kernel wants to move some memory 
> around that's registered, it's got to have some way to either 
> kill the application, or tear down and re-establish all the 
> QP's. I suppose an alternative would be a 
> "SIG_I_KILLED_YOUR_MAPPINGS" type signal to tell the 
> application (or library) that it needs to re-establish all 
> it's pinned memory might work.
>

Only if you are re-arranging memory for a bunch of connections
that were taking a nice nap. If you did this for active connections
they could be dead before you could reregister the memory. And even
if you could reregister it, how do you redistribute the RKeys?



More information about the general mailing list