[openib-general] Question about pinning memory
Gleb Natapov
glebn at voltaire.com
Sun Jul 24 00:39:34 PDT 2005
Hi Jeff,
On Fri, Jul 22, 2005 at 07:04:10PM -0400, Jeff Squyres wrote:
> Related to the scenario above, here's my questions:
>
> 1. What happens if pinned memory is returned to the OS via sbrk()? Do
> Bad Things happen, or does the kernel portion of Open IB handle this
> gracefully?
>
Kernel portion of OpenIB pins physical pages. If you unmap virtual
address range from process address space using sbrk or munmap they will
stay pinned until deregistered.
> (if anyone knows what mVAPI does here, that would also be really useful
> to know)
>
> 2. What happens if the MPI unpins memory that is already out of the
> process? Do Bad Things happen, or does Open IB handle this gracefully?
>
OpenIB should handle this case. It looks up pinned physical pages using
memory handle returned on register and unpins them. It doesn't scan user
virtual space during dereg.
> (if anyone knows what mVAPI does here, that would also be really useful
> to know)
>
> 3. Even the scenario above is not enough for the MPI to properly handle
> the memory pinning issues. Since MPI usually does not have visibility
> for memory that has been free'd (or sbrk'ed), the following scenario is
> possible (and in our experience, likely):
>
> - user calls malloc() and gets virtual address A back
> - user calls MPI_Send with A
> - MPI pins address A
> - MPI saves A in internal lookup tables
> - MPI_Send returns
> - user calls free(A)
> - user calls malloc() and gets virtual address A back, but now A points
> to different physical memory than before
> - user calls MPI_Send with A
> - MPI thinks that A is pinned (because A is in MPI's internal tables),
> and therefore doesn't pin it
> - MPI tries to send A with the IB API, and Bad Things occur (either an
> error, or worse, bad data is sent)
>
> What would be *really* great (from MPI's perspective) is if Open IB can
> know when pinned memory is removed from the process (probably via
> sbrk()) and give the MPI implementation a callback somehow when this
> happens.
>
> Is this possible?
I don't think this should be done in OpenIB. Kernel should have general
trace capability to intercept system calls. And I think such capability
exists and linux trace toolkit uses it, but I am not sure.
>
> Otherwise, the scenario in question #3 is a real problem. There are a
> few possibilities for fixing it, but all are problematic (override
> sbrk() via including ptmalloc2 in the distribution, using LD_PRELOAD to
> override sbrk(), etc.). Any other suggestions would be welcome...
>
Don't forget that glibc uses mmap for large allocations so it is not
enough to override sbrk(). You can use mallopt(M_MMAP_MAX, 0) to
disable allocations using mmap though.
You can also instruct glibc to never return memory to kernel by calling
mallopt(M_TRIM_THRESHOLD, -1). In this case you will not have the
problem you are trying to fix. But you payed the price!
--
Gleb.
More information about the general
mailing list