[openib-general] Question about pinning memory
Jeff Squyres
jsquyres at open-mpi.org
Sun Jul 24 05:20:42 PDT 2005
On Jul 24, 2005, at 3:39 AM, Gleb Natapov wrote:
>> 1. What happens if pinned memory is returned to the OS via sbrk()? Do
>> Bad Things happen, or does the kernel portion of Open IB handle this
>> gracefully?
>>
> Kernel portion of OpenIB pins physical pages. If you unmap virtual
> address range from process address space using sbrk or munmap they will
> stay pinned until deregistered.
Interesting. So Open IB doesn't get a notification upon
unmapping/sbrk'ing?
>> 2. What happens if the MPI unpins memory that is already out of the
>> process? Do Bad Things happen, or does Open IB handle this
>> gracefully?
>>
> OpenIB should handle this case. It looks up pinned physical pages using
> memory handle returned on register and unpins them. It doesn't scan
> user
> virtual space during dereg.
Also interesting. So it's "ok" to use this strategy (maintain MRU/LRU
kinds of tables and unpin upon demand), even though you may be
unpinning memory that no longer belongs to your process. That seems
pretty weird -- it feels like breaking the POSIX process abstraction
barrier.
For example, it could be inefficient if there are multiple processes
using IB on a single node -- pinned memory can consume pinning
resources even though it's no longer in your process. If too much
memory falls into this category, this can even cause general virtual
memory performance degradation (i.e., nothing to do with IB
communication at all).
Is this the intended behavior?
I actually don't know how IB is used in non-HPC environments; perhaps
MPI is somewhat unique in that it doesn't have visibility of the
alloc/free behavior of its user applications (well, actually, MPI does
have MPI_ALLOC_MEM and MPI_FREE_MEM, but most apps unfortunately don't
use these functions :-( ). I can see the argument that [non-HPC] IB
apps that don't unpin before removing memory from a process could be
considered an erroneous program. But that doesn't help my MPI problem.
:-)
A derivative question: what happens in this scenario:
1. process A pins memory Z
2. process A sbrk()'s Z without unpinning it first
3. process B obtains memory Z
4. process B pins memory Z
5. process A unpins memory Z
There's at least 2 points in that scenario where Badness can occur (4
and 5). Is there cross-process-awareness in the kernel tables that
will probably handle these situations? (or are the tables down in the
kernel process-independent such that simple mechanisms such as
reference counting are sufficient?)
-----
Is there any thought within the IB community to getting rid of this
whole memory registration issue at the user level? Perhaps a la what
Quadrics did (make the driver understand virtual memory) or what Va
Tech did (put in their own kernel-level hooks to make malloc/free to do
the Right Things)? Handling all this memory registration stuff is
certainly quite a big chunk of code that every MPI (and IB app) needs
to handle. Could this stuff be moved down into the realm of the IB
stack itself? From an abstraction / software architecture standpoint,
it could make sense to unify this handling in one place rather than N
(MPI implementations and other IB apps).
Bottom line: from an MPI implementor standpoint, it would be great if
we didn't have to worry about this stuff at all -- that the IB stack
just transparently (and efficiently) handled all of it. :-)
(I realize that this is probably asking for quite a lot -- but hey, no
discussion will ever happen unless someone asks, right?)
--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/
More information about the general
mailing list