[openib-general] Question about pinning memory

Gleb Natapov glebn at voltaire.com
Sun Jul 24 00:39:34 PDT 2005


Hi Jeff,

On Fri, Jul 22, 2005 at 07:04:10PM -0400, Jeff Squyres wrote:
> Related to the scenario above, here's my questions:
> 
> 1. What happens if pinned memory is returned to the OS via sbrk()?  Do 
> Bad Things happen, or does the kernel portion of Open IB handle this 
> gracefully?
> 
Kernel portion of OpenIB pins physical pages. If you unmap virtual 
address range from process address space using sbrk or munmap they will 
stay pinned until deregistered.

> (if anyone knows what mVAPI does here, that would also be really useful 
> to know)
> 
> 2. What happens if the MPI unpins memory that is already out of the 
> process?  Do Bad Things happen, or does Open IB handle this gracefully?
> 
OpenIB should handle this case. It looks up pinned physical pages using
memory handle returned on register and unpins them. It doesn't scan user
virtual space during dereg.

> (if anyone knows what mVAPI does here, that would also be really useful 
> to know)
> 
> 3. Even the scenario above is not enough for the MPI to properly handle 
> the memory pinning issues.  Since MPI usually does not have visibility 
> for memory that has been free'd (or sbrk'ed), the following scenario is 
> possible (and in our experience, likely):
> 
> - user calls malloc() and gets virtual address A back
> - user calls MPI_Send with A
> - MPI pins address A
> - MPI saves A in internal lookup tables
> - MPI_Send returns
> - user calls free(A)
> - user calls malloc() and gets virtual address A back, but now A points 
> to different physical memory than before
> - user calls MPI_Send with A
> - MPI thinks that A is pinned (because A is in MPI's internal tables), 
> and therefore doesn't pin it
> - MPI tries to send A with the IB API, and Bad Things occur (either an 
> error, or worse, bad data is sent)
> 
> What would be *really* great (from MPI's perspective) is if Open IB can 
> know when pinned memory is removed from the process (probably via 
> sbrk()) and give the MPI implementation a callback somehow when this 
> happens.
> 
> Is this possible?
I don't think this should be done in OpenIB. Kernel should have general
trace capability to intercept system calls. And I think such capability
exists and linux trace toolkit uses it, but I am not sure.

> 
> Otherwise, the scenario in question #3 is a real problem.  There are a 
> few possibilities for fixing it, but all are problematic (override 
> sbrk() via including ptmalloc2 in the distribution, using LD_PRELOAD to 
> override sbrk(), etc.).  Any other suggestions would be welcome...
> 
Don't forget that glibc uses mmap for large allocations so it is not
enough to override sbrk().  You can use mallopt(M_MMAP_MAX, 0) to
disable allocations using mmap though.

You can also instruct glibc to never return memory to kernel by calling 
mallopt(M_TRIM_THRESHOLD, -1). In this case you will not have the
problem you are trying to fix. But you payed the price!

--
			Gleb.



More information about the general mailing list