[openib-general] Question about pinning memory

Roland Dreier rolandd at cisco.com
Tue Jul 26 21:43:44 PDT 2005


    Gleb> This is what Pete did. We are back to on syscall on each
    Gleb> registration.  And the worst case is three syscalls. First
    Gleb> to check that mapping is valid, second to deregister memory
    Gleb> if it is not and third register new memory.

I don't think this is an accurate description of Pete's work.  As I
understand it, the fast path (no VM operations have occurred since
registration was done) does not involve any system calls -- the MPI
library just checks the queue of VM events (which will likely be in
cache already), sees that it is empty, and proceeds with the IO.

I was just trying to offer a small refinement so the complex code to
handle the very rare case of queue overflow is not needed.

Of course if memory needs to be reregistered, then we hit the slow
patch (just like a cache miss for the registration cache).

    Gleb> Not exactly. When user frees memory by calling free() the
    Gleb> memory is not unmapped from process address space by
    Gleb> libc. We want to cache the registrations in this memory. We
    Gleb> don't want to cache registration across mmap/munmap/mmap (it
    Gleb> is much harder to do if at all possible).  When libc unmaps
    Gleb> memory by sbrk() or munmap the memory guaranteed to be not
    Gleb> used by correct program so it is safe to deregister it if we
    Gleb> catch this event.

The first statement in this paragraph is false.  It's easy to strace a
simple program that does something like

	x = malloc(1000000);
	free(x);

and watch glibc do an mmap(... MAP_ANONYMOUS ...) followed by
munmap().  Even for smaller allocations, glibc may use sbrk() to
shrink the heap in free().  You can read about M_TRIM_THRESHOLD and so
on in the mallopt() documentation.

This means that it is entirely possible for a correct program to have
different physical memory at the same virtual address, even if it does
nothing but malloc() and free().  Pete's work allows caching of all
registrations, including handling mmap() and everything else.

 - R.



More information about the general mailing list