[openib-general] Question about pinning memory
Roland Dreier
rolandd at cisco.com
Mon Jul 25 07:31:46 PDT 2005
Jeff> Interesting. So Open IB doesn't get a notification upon
Jeff> unmapping/sbrk'ing?
Right. The kernel IB code works purely at the page level.
Jeff> Also interesting. So it's "ok" to use this strategy
Jeff> (maintain MRU/LRU kinds of tables and unpin upon demand),
Jeff> even though you may be unpinning memory that no longer
Jeff> belongs to your process. That seems pretty weird -- it
Jeff> feels like breaking the POSIX process abstraction barrier.
In some sense, as long as the memory is pinned by a process, it still
belongs to the process. There may no longer be a valid virtual
mapping for the pages but the original process still holds a reference
on them.
Jeff> For example, it could be inefficient if there are multiple
Jeff> processes using IB on a single node -- pinned memory can
Jeff> consume pinning resources even though it's no longer in your
Jeff> process. If too much memory falls into this category, this
Jeff> can even cause general virtual memory performance
Jeff> degradation (i.e., nothing to do with IB communication at
Jeff> all).
Jeff> Is this the intended behavior?
I think so. As long as there is a registered memory region that
covers a chunk of memory, that memory may be DMAed to by the IB
device. So if a process frees memory but doesn't unregister it, the
system can't reuse it without risking memory corruption.
It is possible to use resource limits to limit the amount of memory
that a single process pins.
Jeff> 1. process A pins memory Z 2. process A sbrk()'s Z without
Jeff> unpinning it first 3. process B obtains memory Z 4. process
Jeff> B pins memory Z 5. process A unpins memory Z
This can't happen. Until process A unpins the memory, the pages have
an elevated reference count and will never be allocated for a
different process.
Jeff> Is there any thought within the IB community to getting rid
Jeff> of this whole memory registration issue at the user level?
Jeff> Perhaps a la what Quadrics did (make the driver understand
Jeff> virtual memory) or what Va Tech did (put in their own
Jeff> kernel-level hooks to make malloc/free to do the Right
Jeff> Things)? Handling all this memory registration stuff is
Jeff> certainly quite a big chunk of code that every MPI (and IB
Jeff> app) needs to handle. Could this stuff be moved down into
Jeff> the realm of the IB stack itself? From an abstraction /
Jeff> software architecture standpoint, it could make sense to
Jeff> unify this handling in one place rather than N (MPI
Jeff> implementations and other IB apps).
There's definitely thinking about this, but the correct approach is by
no means obvious. The Quadrics hooks seem too invasive to merge into
the kernel. I'm not familiar with the VA Tech work.
Unfortunately the IB and iWARP specs are written in terms of
applications handling memory registration. So we can't really expect
all RDMA hardware to have the hooks required to improve the memory
pinning interface. As it becomes clearer what hardware capabilities
are common, and what the userspace requirements are, we should be able
to improve things.
- R.
More information about the general
mailing list