[openib-general] Question about pinning memory

Roland Dreier rolandd at cisco.com
Mon Jul 25 07:31:46 PDT 2005


    Jeff> Interesting.  So Open IB doesn't get a notification upon
    Jeff> unmapping/sbrk'ing?

Right.  The kernel IB code works purely at the page level.

    Jeff> Also interesting.  So it's "ok" to use this strategy
    Jeff> (maintain MRU/LRU kinds of tables and unpin upon demand),
    Jeff> even though you may be unpinning memory that no longer
    Jeff> belongs to your process.  That seems pretty weird -- it
    Jeff> feels like breaking the POSIX process abstraction barrier.

In some sense, as long as the memory is pinned by a process, it still
belongs to the process.  There may no longer be a valid virtual
mapping for the pages but the original process still holds a reference
on them.

    Jeff> For example, it could be inefficient if there are multiple
    Jeff> processes using IB on a single node -- pinned memory can
    Jeff> consume pinning resources even though it's no longer in your
    Jeff> process.  If too much memory falls into this category, this
    Jeff> can even cause general virtual memory performance
    Jeff> degradation (i.e., nothing to do with IB communication at
    Jeff> all).

    Jeff> Is this the intended behavior?

I think so.  As long as there is a registered memory region that
covers a chunk of memory, that memory may be DMAed to by the IB
device.  So if a process frees memory but doesn't unregister it, the
system can't reuse it without risking memory corruption.

It is possible to use resource limits to limit the amount of memory
that a single process pins.

    Jeff> 1. process A pins memory Z 2. process A sbrk()'s Z without
    Jeff> unpinning it first 3. process B obtains memory Z 4. process
    Jeff> B pins memory Z 5. process A unpins memory Z

This can't happen.  Until process A unpins the memory, the pages have
an elevated reference count and will never be allocated for a
different process.

    Jeff> Is there any thought within the IB community to getting rid
    Jeff> of this whole memory registration issue at the user level?
    Jeff> Perhaps a la what Quadrics did (make the driver understand
    Jeff> virtual memory) or what Va Tech did (put in their own
    Jeff> kernel-level hooks to make malloc/free to do the Right
    Jeff> Things)?  Handling all this memory registration stuff is
    Jeff> certainly quite a big chunk of code that every MPI (and IB
    Jeff> app) needs to handle.  Could this stuff be moved down into
    Jeff> the realm of the IB stack itself?  From an abstraction /
    Jeff> software architecture standpoint, it could make sense to
    Jeff> unify this handling in one place rather than N (MPI
    Jeff> implementations and other IB apps).

There's definitely thinking about this, but the correct approach is by
no means obvious.  The Quadrics hooks seem too invasive to merge into
the kernel.  I'm not familiar with the VA Tech work.

Unfortunately the IB and iWARP specs are written in terms of
applications handling memory registration.  So we can't really expect
all RDMA hardware to have the hooks required to improve the memory
pinning interface.  As it becomes clearer what hardware capabilities
are common, and what the userspace requirements are, we should be able
to improve things.

 - R.



More information about the general mailing list